OOCOUEKi: SBSQHE 



ED 193 955 



FL Oil 933 



AOTHOB 
TITLE 



IHSTIT OTION 
SPOBS AGEBCy 

FEPORT NC 
POE DATE 
CONTBACT 
BOTE 



Horst^ P.: And Others 

An Evaluation of Project Inforroation Packages {PIPs) 
a£ Os€d for th€ Diffusion of Bilingual Projects. 
Volume III: A Prototype Guide to Heasuring 
Achievement Level and Program Impact on Achievement 
in Bilingual Projects. 

EHC Research Corp., Hountain View, Calif. 

Office of Eaucaticn (DHEtf) , Washington* d.C. Office 

of Evaluation and Dissemination* 

RHC-OB-aeO 

May 80 

300-77-0313 

222p.: For related documents, see FL Oil 931*932. 



EDES PRICE 
DESCBIPTORS 



IDEKTIFIEES 



MF01/PC09 Plus Postage. 

♦Academic Achievement! Bilingual Education: 
♦ Diffusion: *PrograDi Effectiveness; Suannarive 
Evaluation 

♦Bilingual Programs; ♦Project information Packages 



ABSTRACT 

This report describes an evaluation of Project 
Information Packages (pips), sets of jnanuals and other marerials 
intended to help a school district adopt and iapiement an exemplary 
education project. Pour PiPs were evaluated in a field test^ each PIP 
describing a different bilingual project. It vas concluded that the 
awareness materials produced feu applicaticn for Plps. Field-test 
sites that received PXPs tended not to follov PIP guidelines closel^^ 
but to adapt them extensively^ often with good justification. The 
bilingual programs at the sites were collectively successful^ but the 
d^ sseoiinatf-on effort could not be judged a success. The present 
volume is a collection of specific evaluetion guidelines and job 
aides that were developed for the use of the field-test sites and 
that have been organized in the format of a Prototype Evaluation 
Manual. This volume shculd be viewed as a preliminary draft. It deals 
in detail c^'ly vitl; the evaluation of student achievement. 
(Author) 



}|C }|C if }|C }|C }|C }|C 3|t }H )|t )|t )|t )|t ^ )|t it )|t )|t )|t )|t ^ )|t )|t )|t )|t A )|t )|t * )H )|t )|t ^ 

* Reproductions supplied by EDR£ are the b€st that can be made ♦ 

♦ from the original document* ♦ 



ITS 



UJ 



AH EVALUATION OF PROJECT INFORMATION PACKAGES (PIPd) 
AS USED FOR THE DIFFUSION OF BILINGUAL PROJECTS 



VOLUME III 



A PROTOTYPE GUIDE TO MEASURING ACHIEVEMENT LEVEL 
AND PROGRAM IMPACT ON ACHIEVEMENT IN 
BILINGUAL PROJECTS 



D. P. Horat 

D* M. Johnson 

H* G* Ndvd 

D* E. Douglas 

L* Dt Friendly 

A* 0. H. Roberta 



M% OEr>ARTMENT0FNEALTH, 
E OUCATlON A WELFARE 
N ATlOrfAL INSTITUTE OF 



THI£ DOCUMENT haS ^EEN RE^ftO. 
OUCEO EXACTLY A% ftECElvED FROM 
tHE RERSONOft ORCANIJATION ORIGIN' 
ATINCIT POINTSOP VIEW Oft OPINIONS 
STATED OO NOT NECESSaRFLV RERRE- 
SENTOf^f^rClAL NATIONAL INSTITUtEO*^ 

€0UCatiOn ROSitiON Or ^>OUiCv 



RMC Report UR 460 



USOE Contract Not 300-77-0313 



RMC Reaearch Corporation 
Mountain View, California 



EDUCATION 




May 1980 



This report Is nade pursuant to contract No. 300*77*0313* Tha aaiount 
charged to tba Department of Health » Education* and Welfare tct tha work 
resulting In this report (Inclusive of the anounta so charged for any 
prior reports submitted under this contract) Is $l»113|20S* the nsmea 
of the persons » employed or retained by the contractor, with managerial 
or professlonsl responsibility for such work, or for ths content of the 
report, are as follows; 

D* F* Horst 
D* E* Douglas 
L* D* Friendly 
D* H* Johnson 
L* H* Luber 
H* McKay 
B* G* Nsvs 
A* H* Plestrup 
A* 0* B* Roberts 
A* Vslde2 

The research reported herein was performed pursusnt to s contrsct with 
the Office of Educstlon, U*S* Depsrtment of Heslth^ Educstlon, snd 
Helfsre* Contractors undertsklng such projscts under Government spon* 
sorshlp are encouraged to express freely their professional Judgment in 
the conduct of the project* Points of view or opinions ststed do not, 
therefore^ necesssrlly repreeent offlclsl Office of Educstlon position or 
policy * 



ii3 



PREFACE 



This report describes an evaluation of Project Infosnnatlon Packages 
(PIP8)» a specific type of packaging* as field tested by the United States 
Office of Education (USOE) for the diffusion of four bilingual projects. 
The field test began with the dissemination of the PIPs In the fall of 
1976. The evaluation described here began about nine months later (summert 
1977) and continued through the 1978-1979 school year. 

This report consists of three voltnnes* as follows: 

Volume I* the Summary Report s comprises (a) an executive summary oi 
the study questions and findings* (b) an Introduction to the study (Sec- 
tion D* (c) a non-technical summary of Substudy I* the process evaluation 
of the PIP diffusion effort (Section 2)* and (d) a non-technical suomiary 
of Substudy II* the evaluation of the Impact of the diffusion effort on 
students (Section 3). This volume Is Intended to provide a self-contained 
overview of the policy-related study questions and conclusions. 

Volume II* the Technical Discussion and Appendices ^ documents the 
methodology and results of the two substudles and provides more detailed 
discussions of conclusions and recommendatlona. This volume also Includes 
five appendices: (a) slte-by-slte results of the process substudy* (b) 
slce-by-slte results of the Impact substudy* (c) the complete conceptual 
framework used In the process evaluation substudy, (d) a comparative analy- 
sis of the contents of the four blllngua}. PIPs* and (e) a summary of the 
major* mid-study Inputs from the study advisory panel. 

Volume III* the present vplufne * is a collection oC specific evalua- 
clon guidelines and job aides that were developed for the use of the field- 
test sites and which have been organized in the format of a Prototype Eval- 
uation Manual . This volume should be viet/ed as a preliminary draft rather 
Chan a finisheci product. Further* it deals in detail only with the eval- 
uation of student achievement* which is only one component of a complete* 
bilingual program evaluation. 



ill 



4 



MOTE 



This volumft comprise? many of th« specific achlevement^evaluatlon 
guldaa and vorkshaftta developed for the fleld-teat altee In the bilingual- 
FIP evaluation. Theae materlala era presented here In the font of an eval- 
uation oanualt In order to highlight the major Issues that arose In the 
course of the field test and to Illustrate die kinds of aolutlona that RHC 
has propoaed* 

This prototype venual Is Intended primarily to stimulate discussion 
on ways to resolve the m^jor* prectlcel problems in hlXlngual program 
eveluatlon* Many technical Issues remain to he aettled and* In addition* 
an extenalve process of format development t tryoutt end revision would 
he required hefore thla manual could he considered ready *for general use* 
Heverthelesst until more comprehensive t specific* evaluation guldellnea 
for hlllngual programs are developed t ve helleve that these materials may 
he of use to some hlllngual^program personnel. 



Iv 5 



CONTENTS 



AN ORIENTATION TO THE EVALUATION PROBLEMS AND THE MANUAL (sECTION 0 



PLANNING AND BUDGETING THE EVALUATION (sECTION 1 



FORMALIZING PROGRAM GOALS (SECTION 2 
DESCRIBING THE PROGRAM ( SECTION 3 

CHOOSING AN EVALUATION DESIGN (SECTION ^ 



DESCRIBING STUDENT SKILLS FOR SELECTION AND DATA ANALYSIS (SECTION S 



SELECTING TESTS (SECTION 5 



COLLECTING DATA SECTION 7 



& 



ANALYZING THE DATA AND REPORTING THE RESULTS ^CTION 8 



(i 



APPENDICES 



COHTENTS 



0. AN ORXENmxON TO THE EVALUATION PROBtEMS AND TBE HANUAL 1 

Introduction to the Manual ^ 1 

A Closer took at the Evaluation Questiona 3 

The Prohleai in Answering the Queatlone 6 

Popular '^Solutions" That Do Not Work 9 

ThB Positive Side ^ 11 

Contents of the Manual in Brief 12 

1. PLANNING AND BUDGETING THE EVALUATION 15 

l^k* A Quick Estlmata of Evaluation Costs (Worksheet) 19 

2. FORMALIZING TWGRAH GOALS 23 

2- A. Wtiting Heaeurahle Goals for Student Achievemeat 

(Worksheet) 27 

2^B« Potential Benefits of Billoeual Sducstion Programs 

(Checklist) 31 

3. DESCRIBING THE PB06RAH 35 

3- A* MbdeX for Bilingual Project Description 39 

3-B* clsssroon Observation Guide 45 

3-C* Category Syateia for Describing Reading Treatment in 

Bilicgual Projects 51 

3- D. Format for Reporting Instructional Treatment 55 

4« CHOOSING AN EVALOATION DESIGN ^ 59 

4- A* A Guide to Evaluation Designs 63 

5. DESCRIBING STUDENT SKILLS FOR SELECTION AND DATA ANALYSIS . . . « 67 

5- A. Demographic and Biographic Information Worksheet 71 

6. SELECTING TESTS 77 

fr-A. Selecting an Achievement Teat 61 

fr-B* Selecting ft Language Proficiency Test 101 

6- C« A System for Compering Curriculum Content with the Content 

of CTBS Spanish and English^ Forms B and C 117 

7« COUECTING DATA , 149 

7- ^A« Data Collection Procedures (Checklist) c 153 

7- B* Dsts Recording Forms 155 

8. ANALYZING THE DATA AMD REPORTING THE RESULTS 161 

8- A. Data Analyaia Checkliat 16^ 

8*B* Report-Writing Checklist for Bilingusl-Program Evaluatoi^a. . 167 

8-C« Sample Data -Reporting Tables 175 

APPENDU A; HOW BIG ARE ACHIEVEMENT GAINS? 179 

APPENDU 3; GUIDELINES FOR OTHER EVALUATION AREAS 193 

E*l* Evsluation of Affective Xmpacts 19? 

E*2* Evaluation of Staff Development Activities. • • . 211 

E*3. Evaluatinn of Parent /Community Involvement. « . . 219 

REFERENCES 227 



vi 

7 



AN ORIENTATION 
TO 

THE EVALUATION PROBLEMS AND THE MANUAL 
Introduction 



An Unconventlonsl Manual 

This is not a basic guide to schiev^ent evaluation. It is an at- 
tempt to fill in the gaps between the evalustion theory that all evalua- 
tors know and the realities of evaluating achievement in bilingual pro- 
grams. Many of the problems addressed in the manual apply to evaluations 
of all types of programs but» in most cases» we have attempted to relate 
them to bilingual programs. 



The Perspective of the Authors 

The authors of this manual have reviewed numerous evsluation reports 
from many types of programs^ and have found that most fail to provide con- 
vincing evidence of program impact or even to provide interpretable infor- 
mation on student achievement levels. ManV reports paint overly rosy pic- 
tures of general program ef fectiveness^ while failing to isolate specific^ 
positive impacts that msy actually have occurred. 

While recognizing the many problems involved in evaluating bilingual 
programs^ and the time and financial constraints under which most evalua- 
tors must work» we believe that far more meaningful achievement evaluations 
are possible in most school districts. This prototype manual is the result 
of attempting to develop such evaluations in 19 school districts over a two-* 
year period. It is organized around the major» practical obstacles encoun- 
tered by these districts and provides specific recommendations on how to 
deal with the problems. The intention is to describe the best available 
approaches for those districts that have both the commitment to meaningful 
achievement evaluation and th^ resources thst are required^ and to dissuade 
districts that lack the commitment or resources from expending their ef- 
forts on the collection of misleading information. 



The Intended Audience 

This orientation section is intended for the local project director 
or for the ft*deral» state» or local administrator who wsnts a brief over- 
view of what Is currently possible and what is not possible in bilingual- 
program achievement evaluation. It should also orient the professional 
evalustor as to where the authors stand on some of the controversial^ 
technical issues in this field. 

The body of the msnual is intended for the project director and eval- 
uator who mu«3t work together^ plsn» and implement an evalustion. In gen- 
eral » it sssumes that the evaluator is familiar with bil"* \gual education 
and with basic principles of statistics and evaluation design. 



The Evaluation Questions Addressed 



This draft manual Is not a complete guide to bilingual-program eval- 
uation* It doea not yet Include euch Important topics as process evalua- 
tion or mastery testing* It focuees on^y on the evaluation of student 
achievement and| In particular^ on two klnda of questions* 

• What Is the level of student JPerfoxmance relative to national 
norm groupe or other groups of intereat? 

• What le the Impact of the blllnttual Program on student achievement 
compared to other local Instruction^ past or present? 



The Contents and Organization of the Manual 

This Is an unconventional guide to evaluating bilingual programs* It 
deal;: In depth only with selected* key problems that are either unsolved 
or widely overlooked In current evaluations* Where some manuals emphasize 
evaluation principles that are frequently Impoaalble to apply In practice 
(leaving the decisions and the mldtakss to the local evaluator)^ this 
manual recommends specific* practicable (though often Imperfect) solutions* 

Contents * Following this orientation section* the body of the manual 
Is divided Into sections according to eight problem areas: 

• Planning snd budgeting the evaluation 

• Formalizing program gosls 

• Describing the treatment 

• diooslng evaluation designs 

• Selecting and describing students 

• Selecting tests 

• Collecting data 

• Analysing the data and reporting the results 

In addition* Appendix A contrasts the sizes of errors and effects In 
educational evaluations* and Appendix & Includes sections on evaluating 
student attitudes* staff development* and parent/community component* 
These topics would form major sections In a more comprehensive* program 
e'/aluatlon manual* 

Organization * Each section Is introduced by a two"-page overview 
Identifying the one or two key problems in the area* In most overviews* 
several additional problems are listed for reference with little or no 
discussion* The eight* two-page overviews constitute the "discussion" 
sections of the manual* 

The remainder of each section is e collection of separate checklists* 
worksheets* and other items intended to help the project director or eval- 
uator solve the key problems* 

The section overviews sre relevant to both project directors and 
evsluators* The various items Included in each section may be more rele* 
vent to one or the other* 



2 



9 



A Closer Look at the Evaluation Questions 



Addressed In this Manual 



A _Ty£lcal ''Question-Free'' Approach 

An evaluation must be planned to answer specific questions If It Is 
to be of any value. Too often* blllngual^program evaluations are not de- 
signed to answer any clearly defined question. They are done hecauae they 
are required by the funding agency* or because someone at the local level 
believes that evaluation* In some general sense* Is Important. Frequently* 
the subject matter to be evaluated and the tests to be used are determined 
by the test scores available from the distrlct-wlde testing program. 
Everyone knows that "objectives" are Important* so an arbitrary gain In 
raw-score points Is chosen. The results* probably positive* are described 
In an annual evaluation report and may be accepted by the readers as evi- 
dence of program effectiveness. In fact* auch results mean nothing at all . 

Many evaluations add such refinements as comparison groups (which 
uaually turn out to be non-comparable) and language testing (which Is 
Ignored In analyzing the reading and math scores). These refinements may 
provide the raw material for answering some Interesting questions* but 
unless the questions are clearly defined In the minds of the evaluators* 
they usually are not answered to the satisfaction of the critical reader. 



Different Questions That May Be Asked 

There are many different evaluation questions and sub^questlons that 
may b asked and they overlap and Interact In complex ways. Evaluation 
experts generate heated debates over which 4uestlous should be asked- 
The body of this manual addresses two kinds of student achievement out- 
come questlons--(a) the performance level of students relative to other 
groups of Interest* and (b) the Impact of a particular program relative ^ 
to other programs or Instructional treatments of Interest. 

These questions were selected because they are of central Interest 
for many decision makers and because the procedures for answering them 
are closely related* Other Important achievement questions lnclude~(a) 
whether program objectives were met and (b) whether student skills are 
adequate for higher education* jobs* or general survival In society. 
There are also questions of student attitudes* staff development* and 
parent/community Involv^ent* and of course there are the major areas 
of cost and process evaluations. All of these questions should be consi- 
dered In planning a complete bilingual progrsm evaluation. However* 
only the performance^level and Impact questions are covered In Sections 
1-8 of this manual. 

Student perf ormance*level tjuestlons . Establishing relative perfor* 
mance levels is important because national and local comparison groups* 
where available* can provide realistic standards for program results. 
Parents* school boards* district administrators* and bilingual-program 



3 



10 



t 



ataff may all wiah to know how program etudente compare in readingt Ian* 
guaget math I and other aubjecte to: 

a the national average (for bilingual program atudenta and non** 
bilingual program atudenta) 

a atata and local averages 

a other students in the aame school 

a bilingual program^studonta In pravioua years* 

In order to get meaningful annwars to auch queationSt it is necessary to 
havet (a) accurate measures of parformanca for both program and compari- 
son students* plus (b) a clear understanding of the slnllaritiaa and di£** 
ferencea between the program and comparison atudenta* Answering these 
questions requires careful planning and Implamantatlcm of the avaluatioBt 
but is usually possible in moat echool districts for Emtlish^^lati&uage 
sub'lecte * It is usually imPoeaible for aativs-language inetruction * 

Program impact questions * Ferformanca-lavel questions ara an attempt 
to determine whether atudenta are doing well or not in some absolute sense* 
Impact queationa aak whether performance lavela are dua to the program* 
In other worda**iB the program "effective?" Explicitly or Implicitlyt 
thia question underlies moet evaluation deaigns* It la a queation of 
great Intc^rest to local and f unding-^agency daciaion makera* Howevari it 
is an extremely difficult and expensive question to anawer^ and very few 
evaluationa aucceed in providing convincing evidence on the presence or 
absence of program impacts* 

A major complication in defining the impact question concerns the 
concept of a ^'program*" If wa ara looking for effective programa with 
the intention of spreading them throughout the district (or perhapa to 
other districts) t then the "program" includes only the planat procedures * 
and materiala that can be exported* Research suggeata, however^ that the 
p ersonnel are uaually more important than procaduraa and materials except 
in a few highly atructured programs such as Oistar* The distinction 
between the effects of staff and the effects of program procedures and 
materiale ia very important to decision makers And project directors who 
are trying to improve student learning* It is especially important to 
know that a clearly effective program from one school may not work wall 
in a new setting* 

The other conceptual iaeue concerns the atandard of comparison for 
the program impact* Program impact ia generally measured againat a 
"no-treatment expectationt" that ia* an eetimate of how atudenta would 
have performed without the new special program* In the caae of bilingual* 
program studentai of couree* there is always some treatment* whether it 
la the regular* all*English* clasaroom program or an alternative special 
program* It may well be of interest to know how the same children would 
have fared under these alternatives* However* in this manualt unleaa 
otherwise stated* program impact will refer to the aituation in which a 
new program ie installed and the question is whether student performance 




O 

P s 
n 

g 8 



O Q» 
P rr 



suns 



1J 

o 



M rr 

c ft 

rr 5 

ft ft 

OD C 

*o ft 
ft 

• 



H H* ft H* 

- ST 5 ^ 

ft OQ Q» 

e 1^ ^ ^ S 

ft rr g p& c 

P* < © TO M 

n O < i-t M M 



ft 

Cl. 



» o ^ 

OD ^ 

O 9* O Q» DA rr 
O ft H* rr rr o* O 
ft 



C ^ » » 

n o ^ 

r-* ir o rr ft 

f c G 
p> ft n t* 

C9* 00 n fit ft v 
o rr oc Q) V 

Q» ft rr P* fiD (& 

o "5 

{D S* Hi ft ft 

op « rr 



ft 
ft 



ft Q» 



to P ft 

n ^ 

o H H 

rr O ^ 

^ g s 

rr I oo 

ft rr O 

3 fl> (to 

» rr H 

^1 



01 

o 

rr 
ft 

& 

ft 

n 



o n 

w n . 

rt o r-* 

n ^ a> ^ 
p* p ft 

O ft 

§ O 

p» n Ht 

ft 

(0 fi rr n 

rr rt P* C 

H » ft M 

n ^ 

ft rr S 

OB ^ "6 

pa p 

H. n 

P rr » 

rt V n 

ft ft 

fiD C fiD Q» 
n M 

0» Q» n M 

I* rr q ^ 



nun 

gr M o rt 

ft & 01 

ft ft ^ 

O* O ft 

(D O H H 

rr rr ft 

S *8 ft £* _ 

• rr 0*0 

ft CD o c 

3 rt f1 f1 

^ rr Q» Q» n 



2 2 

^ «a {d 
ft ca 
p CP 



ft 



n rt ft 



rr < rr 

(to o a 

»-h H* 05 
(to S 

V rr ft i-l 

Hr O 

O* O H OQ 

H-i P ft H 

ft 09 M g 

(to 5 

rt ^ rr 

O OB H* ^* 

ft < 0 



ft 6^ fiDg^vn>^rtft 09 fl^*? 

CD V 



rr (to H* 



I 



rr (to 
S* rr 
to ft 

t 



ft 

10 ito 
rr rr 
n ** 
H* < 

n 



ft ft i-i 



ft OQ 



O 

p f-l O rr n ft 

rt »-h*-** £r f& 

09 O ft ft V i-l 

• 9 S s " ^ 

(to ft O • 

*id p n C Ht 
n n rt 

o ft e 'p n M 

o. i-l o P 



ft 

fiD 



ft O M 

«a P m h*OQ 

rr rr i-t ft ft 

(to CD Sd n P 

• I 



•p >^ n 
*P rr 

ft (to 

P H M 
rr 0*00 
y* ft 
ft X 

ft w O 

5 ' S 
o p 

C rr 
^1 M (D 

n p 

ft o 

00 H* 



o n 

" i 

rt ft 

O o 

rt fiD 



n 

o 



ft 



<: 
p* 

rr 

O* 
ft 
H* 
P 
OO 

H 

ft 

? 
a. 



The Problema In AnBVerlng the OueBtlonB 



The Five Major Pltffllla 

In practice^ there are five maJoL problem ereea that destroy the 
credibility of evaluationa of all types of programa* Theae problem areas 
aret 

Pndefined Program goals and objectives * Huch has been written about 
exactly how to sat goals and define objectives* However^ we are only 
concerned here that aome reasonable statement of **what*ia*taught«Hihen^ is 
available to the evaluation plannera* Otherwise^ especially in bilingual 
programs t students may be tested in languages they do not know or on 
aubjecta they have no^ encountered* The problem of goals that are not 
specified (or are not utilized in planning and reporting the evaluation) 
ia widespread* It applieo equally to performance*level end impact evalu- 
ations* While conflicting legal* political^ and social influences make 
the program planner's task difficulty there are no insurmountable t«chni** 
cal problems In spelling out what is being taught* Project directors 
and evaluatora simply fail to do it* 

Inappropriate tests * Obtaining appropriate tests is a serious prob* 
lem in perf ormance*level evalustiona* Language proficiency tests sre 
the subject of considerable controversy among the experts* Theoretical 
problems can be f'luud with all lenguage teste* end, many are difficult 
and costly to adm^uister* Standardised math computational tests may pro- 
vide acceptable measures if instructions are translated where required 
(note also the suitability of the norme* discussed under ^comparison 
groups*^ below)* Standardized English reading testa may also be accept* 
able* but Interpretation is unclear when students have learned to read 
in another language first* For example* a first-grade English reading 
teat is probebly not appropriate for a fourth-grade student who has 
jost begun to laam English but reads flucsntly in another lenguage* 
Reading testa ere simply unavailable for many other languages* Using 
test levels that are too difficult or too eaay often distorts evaluation 
reaults* 

For impact evslustiona^ the testing problems are compounded* becauae 
impact evaluations involve the comparison of two or more different in- 
structional treatments* In this situstion* any given test will probably 
favor one treatment over the other simply becauae it matches the instruc- 
tional content better* We can speculete that changing tests might even 
reverse the conclusions sa to which treetment is better but* the fact is 
that no one really knows* 

In theory^ the testing problems can be solved through further test 
development but* as a practical matter* the evalustor cannot now find a 
completely satisfactory set of test.i* The problem is further compli* 
cated becauae* in many districts* evaluatora Are restricted to using 
tests from the district-wide program* 



6 




Lack of comparison groups > For performance-level evaluations, na- 
tional normal are available for English reading and math» although their 
relevance to a particular group of hlllngual-prograin students may he 
open to question* Relevant norms are not widely available for English 
language proficiency tests* although most language proficiency tests 
have some form of bullt-ln standards* Local comparison groups obviously 
exist* but even where they are clearly of Intereat* It may be difficult 
or imposalble for administrative reasons to obtain the necessary test 
scoras and other descriptive Information* 

Historically* impact evaluations have Implied the need for random- 
ized control groups* In most casea* bilingual-program regulations ef- 
fectively prohibit such groups* In any case* It now appears that* In a 
school setting* assignment to a control group msy constitute a negative 
treatment and thus create an Inappropriate comparison* Impact studies 
probably require careful collection of baseline data for several years 
prior to the start of a new program* For all practical purposes* this 
rules out any form of precise impact evaluation except In districts that 
collect these data as a routine matter* 

Careless testing procedures * The need for careful testing procedures 
applies to all forms of evaluation and Is well understood by virtually all 
evaluators* With the poaslble exception of language testing* there are 
few technical problems Involved* However* a great deal of effort Is In- 
volved In proper testing* and It Is often difficult to focus this amount 
of effort on testing activities* The result Is that questionable testing 
procedures jeopardize the credibility of many evaluations* 

Improper analysis and Inadequate reporting * Data analysis and re- 
porting are Uhe final links In the evaluation process* Like test adminis- 
tration* data analysis and reporting present no major technical probl^s* 
All that Is required Is the thoughtful application of existing evaluation 
methodology and the careful documentation of what was done* Hevertheleas* 
few evaluation reports are even marginally sdequate* In general* the 
evaluation questions are not clearly defined* Information on the tests 
and testing procedures Is Incomplete! Inappropriate evaluation models are 
applied* and little or no attempt Is made to tie results to program fea- 
turea* Of all the major pitfalls* however* this one may be the easiest 
to eliminate* Given adequate Incentives and guidelines* most evaluators 
could produce satisfactory analyses and reports* 



7 

14 



Additional Ofaatacles Related to Progtaa CharacterlBtlcs 



In addition to the five pitfalls abovei which apply to evaluations 
of all special prograsst there ere four wldeapread probleas that are of 
concern In many bilingual program evaluation^! 

Evaluation of fragpteats of s proRram * Most bilingual progrsnu are 
designed to cover seversl grsdesi snd many lipportant akllls sre not even 
Introduced et the lowest grades* Such prograos sre often started at thb 
lover grade levels and expanded upward, one grade per year* A K*6 pro^rsK 
csnnot be evsluated by observing one or two of the lower gradea* A long-*' 
tsnii longitudinal study la rsqulred* but such studies present oany prob* 
lems* In fact* student turnover makes most program evaluations longltudl* 
nal In theory only* 

Evaluation of new ProRrame * BlUtgual education la characterised by 
new and conatantly evolving programs* There Is e greet deal of presau&re 
to provide Ismedlste evidence of positive results ^ but there iM almpXy no 
wsy to do s mesnlngful outcome evsluatlon of a program that Ifi cidy par* 
tlally In place or le in e stste of flux* 

Vsrlstlon In Instructlonsl treatment * Treatment In bilingual pro- 
grams often vsrles widely among atudents* even within a single classroom* 
Mesnlngful evslustlon rsqulrss s clesr understanding of what happena to 
each student t but when Instruction la described dearly* It may become 
obvious thst only s few students received any one treatment* The differ*^ 
ent groups may be too dissimilar to sggregate but too smsll to snalyse 
aepsrstely * 

Testing of vounR children . The testing of young children » sspeclslly 
thoss below the third grade* Is notoriously difficult* Many bilingual 
programsi howeveri focus hesvlly on the lowest grades* There Is no obvl«« 
ous answer to this problem* 



Eve lust Avs liability snd Turnover 

An sdequste evslustlon requires s lot of evslustor time and careful 
sdhsrence to s long^rsngs plsn* Thsse requirements srs difficult to meet* 
especially In small districts* Frequently* the evslustor has very limited 
time and resources; when he or she lesves to take a new job* the work of 
seversl yesrs msy bs lost* While these are msnsgen^&nt problems rsther 
then technlcsl ones* they sre msjor csuses of Insdequsts program evalua- 
tions * 



Popular "Solutio ng " That Do Not Wovk 



The frustrations generr^ted by the kinds of probl^s described above 
have lead to many misguide<^i attempts at solutions. Some fail to answer 
the impact question^ but do answer other questions of possible interest. 
Others are of no use at all. 



Approaches That Should Wever Be Used 

Posttest minus pretest . In lieu of any better ideas, many evaiJ *ators 
simply subtract pretest scores from posttest scores and compute the sig^ 
nificance of the difference. Since almost all groups of children make 
soma gains, even when they are falling rapidly behind their peers, this 
approach is of no value at all. A popular variation, selecting a gain of 
some arbitrary number of raw*-score points as the program target, is no 
improvement'. 

Grade-' equivalent scores (the month--fdr^onth**gain myth) . Analys es 
based on grade^^equivalent scores still, unfortunately, appear all too 
frequently. They are based on the mistaken belief that a gain in test 
scores of one or more months for each month of instruction represents 
good progress. This is not true. Gradc^equivalent scores provide an 
illusion of simplicity but, in fact, they are almost impossible to 
interpret, even for specialists in test construction. Grade^equivalent 
scores should never b£ used hy anyone for any purpose whatsoever. 

lQ*"based formulas . From time to time an attempt to use IQ scores 
appears as the basis for evaluating reading or math performance. The 
idea that IQ tests provide an absolute standard against which to compare 
a specific skill is simply a misunderstanding. IQ^^based formulas are iiot 
appropriate for use in bilingual program evaluations. 

Subjective data . As a last resort, evaluators sometimes fall back 
on subjective data, usually teacher reports. While such reports are 
always useful in interpreting results, they can never be assumed to repre- 
sent reliable, valid measures of student performance. 



Approaches That Are Widely Misused 

Criterion-referenced testin g. A great deal of mistique has been 
generated around criterion^referenced tests, and some evaluators suggest 
that they solve the major problems faced by evaluators. Actually, what 
the criterion*-ref erenced*-test advocates have done is to change the ques- 
tion that is being asked. Criterion~ref erenced tests may provide infor- 
mation as to whether program objectives have been met (although those 
objectives may be quite arbitrary)* Measuring performance level or pro- 
gram impact still requir<!S reliable, valid tests with adequate range of 
difficulty (no floor or ceiling effects). In principle, criterion- 
referenced tests could meet these requirements but, in practice, most do 
not. 



* 9 



16 



Ga p*> reduce Ion modcla ^ *'6ap reduction*' Is a tern that appears In the 
bilingual program evaluation literature* It usually means either (a) 
students get closer to the national norma* or (b) students get closer to 
some dissimilar comparison group* The former Is simply a special case 
of the nomt-ref erenced model* vhlch Is useful for perf ormance-level avsl^ 
uatlon but generally not for program-Impact evaluation* The lattor le an 
example of non*random comparison groups (see below)* The Important point 
la that "gap^reductlon** Is simply a new name for familiar designs* The 
new name does not change their strengths or weaknesses* 

Mon*random comparison groups * Many bilingual program evaluations 
make use of non-random comparison groups^ that Is* different kinds of stu** 
dents who are receiving different Instructional treatments* As part of a 
performance-level evaluation* such comparisons may be of great Interest to 
local decision makers and program staff* In general* however* such compar- 
isons do not by themselves provide program Impact Information because stu- 
dent differences are confounded with program differences* 

Time-series or historical data dealj^ ns* In combination with non- 
random comparison groups* time'^serles designs may provide the most accu- 
rate program-Impact evaluations* However* the baseline (pre*prograa) 
data must be collected very carefully* Long-tetm planning la usually 
required well before the start o€ a jiaw program* and this presents a 
serious practical prob?.em* For^ example* none of the 19 sites In the 
blllngual-PIP field test had the data on hand to enable them to use this 
design* 



10 17 



The Positive Sida 



The preceding pages have painted a very bleak picture of the current 
state of achievement evaluation. At this polat^ you may be ssklng If 
things can really be this bsd and* If so» vhether therr> l6 any point In 
ever trying to evaluate student achievement. The answers are (a) that 
the current situation Is definitely as bad as we have painted lt» but (b) 
that useful evaluations are quite possible In almost every school district. 
However* there are two Important qualifications to the latter answer: 

• You must be clear about the questions you are trying to answer* 
and It may not be feasible In your district to answer every 
question that you would like to answer. Requirements for answer^ 
Ing two kinds of questions are summarized on this page. 

• You must attend to all of the requlr^ents for carrying out a 
useful achievement evaluation. These requirements are organized 
under eight section headings In this meanual. The section con^ 
tents are summarized on the following two pages. 

The Questions of Program Impact on Achievement 

These questions are very difficult and costly to answer In most dls* 
trlcts. It Is not overly difficult to determine the performsnce level of 
the students (see below)* but It Is very difficult to determine how much 
a particular program has contributed to the performance level. For ^am* 
pie* to determine the impact of a new program on achievement* you must 
have accurate data on both program and non-program students* both before 
and after the Introduction of the program. If the program is the only 
change in the district* and if the achievement of program studants Improves 
in relation to the non^program students* then you can probably conclude 
that the program produced the Improvement. 

The Questions of Student Perfoirmance levels 

It is quite possible to get good measures of student performance 
levels* simply by following the procedures described in this manual. This 
information can tell you where program students stand in relation to other 
students in the district or in publishers' norm groups. It will also let 
you compare the achievement levels of program students from year to year. 
It will probably not tell you with any certainty whether the program was 
the cause of Improvements or declines* but it may alert you to probl^s or 
support your judgments about program effects. 

Other Questions 

Of course there are other Important evaluation questions that should 

be answered* and some of them are less difficult to answer than are the 

achievement questions. These would Include whether thfi program is operat** 

ing as planned (process evaluation) and whether the students are achieving 
specific^ instructional objectives. A complete evaluation addresses all 
of these questions. 



II 





Contents of the Manual in Brief 



l# Planning and budgeting tha evaluation * Evaluating a bilingual 
program Involves developing a design * selecting tests * training test- 
erst supervising data collectlont analysing data* writing reports and 
presenting findings to district personnel end program staff« Some 
evaluators are also asked to train staff In diagnostic procedures* Few 
budgets permit evaluators to work In any depth on the basic problems 
that exist In each of these areas (see below) « Many evaluators have 
too little time for even a minimal effort on the complete set of tasks 
listed above« This section provides rough guidelines ss to what can be 
accomplished at different levels of effort* 

2. Formallglng program goala # Few programs have written down 
exactly what they hope to accomplish (with rationales explaining why 
their Instructional approach should succeed) In each subject area and 
grade level« This section describes the kinds of brief » clear goals 
that are eaaeutlal for selecting teats end Interpreting evaluation 
results* Batlonales are treated In the next section* 

3* Describing the treatmen t* In order to Interpret results* the 
evaluator and the audl#tnce for the report must know how much time tha 
students spent on each topic* and ^actly how that time was spent* 
Reports often Include discussions of gains (or losses) for students who 
actually received no special training In relevant areas* This section 
provides a detailed list of progrsm features that should be considered 
by the evaluator and summarised In the evaluation report* 

4* Choosing experimental designs * An experimental design Is a 
formal statement of a question or set of questions that the evaluation 
'la Intended to answer* As discussed above* the moat Important feature 
of a design for evaluating a bilingual project la the control group or 
other stenderd of comperlaon* It was polntsd out that there usuelly Is 
no practicable way to get a oreclae answer to the basic question of 
whether students do better In a program than they would have done 
wlthou* It* However* Important questions such as whether students are 
doing 'Wll enough*'* *1>etter than last year*'* *'better than students In 
other projects*** and so on, csn often be answered precisely enough for 
prsctlcal purposes* This section describes the bsslc questions thst 
csn he snswered^ and the designs thst will snswer them. 

5* Selecting and describing students * The wsys In which students 
are selected Influence evaluations because they determine the charac* 
terlstlcs of the program groups snd the availability of comparison 
groups* In sddltlon* descriptions of student backgrounds and educe- 
tlonal experiences are needed In order to Interpret tha results o£ the 
progrsm evaluation* This section explslns some of the wsys in which 
student selection spprosches sffect impsct evaluation designs and 
Indicates the Information required for each student In order to under* 
stsnd her or his performance in the program. 



12 

19 



6* Selecting tests * The particular tests used to measure Ian- 
guage* readl*i^» math^ and perhaps other skills are of extreme lapor- 
cance. If an evaluation is to povlde any useful Information at all« A 
great deal has bean written about the principles to follow In choosing 
tests^ and these principles mu^'i: be observed* In practice^ however^ 
there are very few acceptable tests to choose among^ and the most Im- 
portant thing Is to pick a test that Is relevant to what Is taught In 
the program* This section lists the basic features to look for^ names 
the most widely used tests» and provides an example of a 'detailed 
analy£;ls of test content* 

7* Collecting data * Experience In Title I and other evaluations 
has shown that the size of program Impacts on student scores may be as 
little as a few raw-score points « even In a very good program* This 
tnay also be the case In bilingual education programs* Major violations 
of correct testing and scoring procedures appear to be very commmon Ic 
school settings and may often be the real sources of apparent program 
successes or failures* This section reprints a list of basic testing 
and scoring ntles that must be followed for ^ meaningful evaluation* 



8* Analyzing the data and reporting the results * Convincing 
evaluation reports from bilingual programs (or* for that matter* from 
any educational programs) are almost non-existent* In general* appro- 
priate data analysis consists of determining If there appear to be 
program Impacts and* If so* exactly iriiat causes the apparent impacts* 
Only the simplest of statistical techniques are required in many cases* 
although in some evaluations* multiple regression techniques may be 
required to adjust for students' socioeconomic status* educational 
background* and other relevant factors* Careful* common-sense detec- 
tive work Is always required In locating causes* This section suggests 
which analyses to conduct and what to Include In the report* Refer^ 
ences to more detailed treatments of data analysis procedures are also 
Included* 




PLANNING AND BUDGETING THE EVALUATION 



Major Cohtent Item 

1-a a quick estimate of evaluation costs (worksheet) 



21 



h FLANNXKG AND BUDGETIKG THE EVALUATXOK 



An evaluation dealgn for a bilingual program my be aa alopls aa an 
outcome evaluation to meet minimum Title VIZ requlrementa or aa complex 
as a complete procesa and outcome feedback ayatem for teachera^ project 
dlrectora and district peraonnel* We aaaume here that a declalon baa been 
made ae to the general nature of the evaluation and that aome performance* 
level and/or Impact evaluation will be included* 

One of the flrat planning atepa for the project director la to check 
on available evaluation reaourcea tn the dlatrlct and« aaaumtng ahe or he 
haa the authority* to decide Aether to uae an tn-houae evaluator or hire 
aomeone from outalde* The possibility of obtaining special akllla or fa* 
duties from a unlveralty or private evaluation apeclallat muat be velghed 
against the potentially lower coat to the Project budflet and more conven* 
lent working relatlonahlp with an evaluator on the dlatrlct payroll* 



Key Problemt Speclfyina Evaluation Ta^ka 

Ferhapa the major evaluation planning problem encountered 
In the PIP field teat waa the dlacr^ancy between (a) the neny 
evaluation taaka that needed doing, and (b) the time and re- 
aourcea available to do the job* Typically* evaluatora Are 
ejcpected to help In dealgnlng the evaluation * end aelecting 
testa* and hove full reaponalblXlty for training teatera* auper- 
via Ing testing * analyzing data , writing reports and p resenting 
reaulta to district personnel* Many sre slso Involved In process 
evsluatlona and In providing feedback to teachera* Budgets for 
auch aervlcea may cover leaa than 25 daya of evaluator time* far 
too little for anything more than e auperflclal effort* While ssch 
district tmist decide on Its own evslustlon needs* this section 
provides some rough guidelines ss to how much ssrvlces will cost 
from extcrnsl evelustors* 



16 

22 



Related laaues 



Long-range planning » Evalutlon of bilingual programs requires long* 
term^ longitudinal evaluation designs. The evaluations should be planned 
and budgeted on this basis. 

Evaluator attrition * From time to tlme» evaluators resign and must 
be replaced* Occasionally this happens In the middle of the year. Unless 
all evaluator activities are carefully documented « or someone who remains 
with the project is thoroughly familiar with what has been done» you can 
expect to loose much of the evaluation for the year* This possibility 
must be weighed against the substantial coats of documenting every step 
or Involving a second person In the evaluation. 

Conforming to district Policies . In many districts > test selection 
and other key evaluation decisions are constrained by district policies^ 
and the credibility of bilingual program evaluation may suffer* The 
project director and evaluator must use good judgment as to whether the 
potential Improvement to the evaluation justifies the problems in devlat* 
Ing from district practices. Where deviations are not justified or are 
simply not possible^ all that can be done Is to document the effects on 
the evaluation In the evaluation report* 



17 




1-a 

A Quick Estimate of Evaluation Costs 

Evaluation dosts will vary widely from district to district* This 
worksheet Is Intended to give a quick estimate of the cost of an achieve^ 
ment evaluation In a bilingual program, using an external evaluator* It 
may be possible to obtain more time from a district evaluator at less cost 
to the project, depending on local policies and resources* 

'^0 levels of effort are given* The *^mlnlmum*^ level Is Included as 
a lower bound on costs* This level Is not uncommon In school districts 
and joay meet evaluation requirements of some funding agencies, but It will 
not provide useful Information In most cases* The *^major^^ level represents 
a more realistic estimate for an adequate evaluation* The worksheet Is 
only Intended to provide a ballpark estimate for a few minutes work* It 
cannot substitute for careful evaluation planning* 



Evaluation Tasks Not Included 

The worksheet addresses only a limited part of a complete evaluation* 
Do not forget to Include the following additional tasks In your complete 
evaluation plan* 

1* Needs assessment 

2* Process (fosnonatlve) evaluation 

3* Teacher workshops on evaluation 

4* Student diagnostic testing 

5* Cost analysis of program 



Evaluator Rates 

Use actual rates If you know them* Otherwise, select 
lowing for a rough (1980) estimate* 

1* Qualified Independent evaluator (no overhead) 

2* Evaluator contracted through an evaluation 
company (Includes overhead) 

3* Senior evaluator from major educational research 
company (Includes overhead) 



one of the fol** 

$100f per day 

$200f per day 

$30t>f per day 



19 



24 



CoBt-Estlmatc Workghect 



Typical Lftvcl of Effort Per Tack 

1« Evaluation planning (produce vrlttftn plan) days 

Mlnlmutt (Routine perf ormance^lftvel evaluation 

In a Bmalli familiar program); 3 days 

Hajor (Comprehensive Impact evaluation In a 

large* unfamiliar program); lOf days 

2« Select teatB day* 

Minimum (Familiarisation with dlatrlct testa) 1 day 
Hajor (Review commercial achievement teats «nd 

match to curriculum* review language and 

affective testa* develop staff development 

and parent/community Inatrumenta* document 

process); 20f days 

Develop achievement or language testa; Hot feasible 

3* Train testers , days 

Minimum (Small project* experienced testers* 

pre- and posttestlng); 2 daya 

Major (Large project^ Inexperienced testers* 
achievement and language testa^ pre- and 
posttestlng); 6 days 

4« Supervise testing „_ days 

Minimum (One day esch» pre- and posttest); 2 days 

Major (monitor all testing); 14+ dsys 

5# Conduct classroom visitations days 
Minimum (one visit pr classroom during the 
course of the school year; small program; 6 
classrooms^ 2 classes visited par day); 3 days 

Major (two visits per classroom; large program; 

24 claasrooms* 2 classes visited par day); 24 days 

6. Analyze data dsys 

Minimum (Prepare achievement data for standard 
computer analysis* small program^ pre- and 
posttest) ; 5 days 

Major (Comprehensive evaluation » large program* 
thorougfh stud^ of computer print outs and 
matching results curriculum, pre-* and poat* 
test); 20+ days 



20 

2S 



7. 



Write reports 



days 



Mlnlmua (One report^ routine evaluation); 
Major (Pre- and posttest reports » comprehend 
slve evaluation^ polished posttest report); 

8. Present findings 

Minimum (Discuss uith project director^ pre^ 

and posttest) ; 
Major (Formal presentation for school board* 

dlacuss with project dlrecor* feedback to 

teachers) ; 

Total number of evaluator days (20 to 1054-) 

X Cost per day 

Total evaluator cost 



Addltlonal->Cost Items 



1. 


Test administrators (If needed) 


local substitute rate 


2. 


Tests 


$1.00 per student* 


3. 


Test scoring (by hand or by 


$ .70 per student* 




scoring service) 




4. 


Computer time for analysis 


less than $200* 




Secretary tline 


local rates 


6. 


Printing costs for reports 


local rates 



*May vary considerably. Use for rough estimates only. 



5 days 
25+ days 

, days 

2 days 
lOf days 



21 

26 



F0RHALI2IN6 PROGRAM GOALS 



^SECTION 2 



Major Content Items 

2-A. WRITING MEASURABLE GOALS FOR STUDENT ACHIEVEMENT (WORKSHEET) 
2"B, POTENTIAL BENEFITS OF BILINGUAL EDUCATION PROGRAMS (CHECKLIST) 



27 



2* FORMALIZIlfG PROGRAM GOALS 



This section is concerned with the very basic and widely ignored 
task of spelling out the general kinds of impacts that a particular 
bilingual program is expected to make* (Section 3 deals with deacrib* 
ing how the program is expected to meet these goals* ) An sbsolute 
minlnum for a meaningful evaluation design should includet 

Q Student achievement gosls 

• Student affective goals 

• Parent/ community goals 
s Staff development goals 

For each set of goals* implementation time schedules are necessary 
in order to determine what can be evaluated the first year of the program* 
the second year* and so on* Ideally* each set of goals should also be 
discussed in relation to a needa assessment* in order to Justify the 
goals and dsnonstrate that they are neither trivial nor unrealistically 
difficult. 

This section focuses on student achievement goals* 



Key Problem; Defining Student Achievement*Coal Categories 

While there are many important considerations in specifying 
goals* this section is limited to one that is absolutely essen** 
tial* Goals must* at the vet7 least* be broken down byt 

a* Subject areas (e*g«* reading* language* math) 
b« Languages to be used (e*g4* English* Spanish* etc*) 
c« Student language proficiency category (a«g** English* 
limited or proficient* Spanish: limited or proficient) 

Using these examplea* the first categories of goals would be 
those for 

• Reading* in English* for fluent** English students* 

• Reading* in English* for limited*English studentb« 

This is a most rudimentary breakdown* but it will be noted that 
it requires 3 x 2 x 2 « 12 sets of gosls* and these goals must be 
completed for each grade level* 

This section includes a worksheet and a checklist to help in 
specifying goals* However the first step must be to define 
categories as in this example* Few diatricts provide even this 
basic breakdown* but without it* the evaluation mesns little* 



24 




Related Issues 



Legal requirements . The goals of most programs must meet locals 
state* and possibly federal guidelines In addition to those developed by 
project personnel. Compliance with riiese guidelines may be the major 
consideration In the goals you state and the way you state them but» In 
most cases^ It should still be possible to Include all of the basic 
categories of goals (Indicated under ''Key Problems'') within the legal 
constraints* 

Short-term and long-term goals . Many projects fall to distinguish 
between long range goals that can only be evaluated over^ a period of 
several years » and short-term^ Intermediate goals that may be relevant 
to a one«year evaluation. This Is an especially Important problem In 
bilingual programs since (a) some long-term goals (e.g.* Improved English 
skills when compared to a non*blllngual classroom) may not apply until 
the later grades » and (b) many bilingual programs experience high student 
turnover* Long**term goals may not apply to short-term project students* 
and therefore special goals may be required. 

Goals for follow^P services . It Is widely recognized that students 
who meet existing criteria and are transferred from a bilingual program 
to a conventional classroom m^y still be In need of special follow-up 
services* In districts that provide such follow-up services^ the follow- 
up goals should be clearly specified and carefully Integrated with the 
bilingual program goals as well as with non-blllngual program goals. 



25 




Worksheet for Writing Measurable Goals 



for Student Achievement 



A Worksheet for Writing Measurable Goals for Student 
Achievement Is Included In this chapter. It Is designed 
for use by the evaluator In conjunction with the program 
staff, and should be helpful In clarifying to the staff 
how to go about writing goals systematically. The chart 
Is organized by subject area, by language group, and by 
language of Instruction. It can be adapted to meet local 
needs by adding or deleting categories and Including 
higher grade levels. The category of English reading for 
Spanish dominant students Is filled In to illustrate use 
of the chart. 




Worksheet for Writing Measurable Goals for Student Achievement 



Year 



Grade Level 

Kindergarten ist Grade 2nd Grade 3rd Grade 

Mean Score Mean Score Mean Score Hea& Score 
Subject Area and Expected Expected Expected Expected 
LanRua&e Grcup Taught? on Measure Taught? on Measure Tauahtt on Measure Taughtt on Measure 

I« First and Second Language Skills 

A« For Spanish doninant students^ 
limited in English proficiency 

1* Spanish skills 

a* listening 

b- sneaking 

c* reading readiness 

d* reading 

e# writing 

2. Epgltsh skills 

a# listening , 

b* sneaking 

c* reading readiPAgg 

d« reading no none no none 5 kids* non^ 38 kids mean of 

40 kids 40 kids 35Xila 

on CTBS 

e* writing 



*5 of the 40 limited English students received instruction* 



Worksheet: for Wricing Measurable Goals for Scudenc Achievement: 



Year 



Grade Level 

Kinder&art:en 1st: Grade 2nd Grade 3rd Grade 

tfean Score IS&an Score Mean Score Mean Sccre 
Subject: Area and £Kpect:ed Bxpect:ed Bxpect:ed Expect:ed 
Language Group TauRht:? on Measure Taught:? on Measure Taught? on Measure Taught:? on tteasure 

fi. For English dominant: st:*:dent:84 
fluent in English* 

It Spanish skills 

a* listening 

bt speaking 

c t reading readiness 

d* reading 

. et vrit:ing 

2* English skills 

a* listening 



b* speaking 



c* reading readiness 

d* reading 

e* vrit:ing 



34 

33 



2"b 

Potential Benefits of BlllnRual Education Programs 



Introduction 

One problem In the evaluation of bilingual projects is that the 
pected benefits of a program are broad» and yet most evaluations tend to 
focus only on those areas of student achievement for which tests am 
readily available* Many other Important outcomes for students^ as well 
as for the school and the community^ may be overlooked* For example^ one 
of the Immediate strong Impacts of a well Implemented bilingual program 
on limited English-speaking students is a sense of belonging and the 
ability to participate in academic and social activities* This list of 
potential benefits Is designed to be used by evaluators In conjunction 
with program staff* The list can be used In setting and prioritizing 
goals» and selecting which ones will be documented or measured* It can 
also be used to highlight unintended outcomes of the program* 

Inatructlona 

For the first column* **Xntended Result of Project/* check those Items 
that are explicit goals of the program* For the second column* *'Xhls Is 
Being Measured/* check those Items that you are measuring or documenting* 
For the third column* *'He See This Happening*'* check those Items that you 
feel are occurring as a result of the bilingual program* 



31 

35 



Potential Benefits of Btltngual Education ProRrame 



DlBtrlct 



Date 



Project Director 



Bvaluator 




Students 

I* More meaningful education for studenta 
of limited BngllBh proficiency alnce 
atudenta are able to participate fully 
In their educational experiences 

• student an4 teacher can better com- 
municate with one another 

• student Is able to jmrtlclpate In 
broader range of achool activities 
Including aoclsl as veil as acadonlc 
activities 

• student Is better able to relate to 
and profit from Instructional 
materials 

2* Increased verbal expression 

• Increased use of native language 

• Increased use of aecond language 

3* Greater sense o£ belonging due to 
ceptance of language and cultural 
diversity 

4* Increased benefit from teacher guidance 
and counseling due to use of native Ian* 
guage of student 

S* Reduced alienation between parents and 
children because of school's Inculcstlon 
of respect for student's home language 
and culture 



32 




Potentlsl Benefits of Bilingual Educstton ProRrsms (Continued) 




Students (continued) 

6« Improved attitude toward school 

7« ImprcfVed attitude toward certain school 
subjects 

8« Improved self-*- concept 

9« Improved niotlvstlon 

10« Better race relations 

• Increased interethnlc play at school 
and at home 

• Improved attitudes toward other ethnic 
groups 

• Improved attitudes toward other langusges 

• Improved sttltudea toward other cultures 
11. Other 



School District 

1« Improved school climate 

2« Improved relations among stsff 

3« Improved community- school relations 

4« Greater degree of compllsnce with legal 
msndstes 

5« Decressed district spending due to 
decressed retention rste 



33 




Potential Benefltfl of BlllnftUfll Education Programa (Continued) 



School Dlfltrlct (continued) 

6* Higher attandance bringing In higher ADA 
7* Reduced adult/student ratios 
8t Additional staff 

9* Additional training and professional 
development for Instructional staff 

10* Additional msterlals^ facilities, equipment^ 
supplies 

lit Other 



Parents and ComiBunltv 

It Grester participation In school activities 

2* Increased knowledge of bilingual program 
operation 

3 . Improved race relations 

4t Improved communlty^school relstlons 

5t A more Infotmad citizenry (resulting from 
perent^ edult educetlon programs) 

6t Addltlonel jobs for the community 

7t Other 




34 

38 



DESCRIBING THE PR06RAH 



Major Content Items 

3-A, MODEL FOR BILINGUAL PROJECT DESCRIPTION 
3-B, CUSSROOM OBSERVATION GUIDE 

3-C, CATEGORY SYSTEM FOR DESCRIBING READING TREATMENT IN 
BILINGUAL PROJECTS 

3-D. FORMAT FOR REPORTING INSTRUCTIONAL TREATMENT 



\ 39 



3. DESCRIBING THE PROGRAM 



Accurate Impact meaaurea represent only one Bide of the task of pro* 
duclng a meaningful^ uaeful Impact evaluation* The other aide conaleta 
of the deacrlptlon of the program that producea the Impact and tha analy-* 
Bis of why the program treatment leada to (or falla to lefd to) dealrad 
impacta* At tha atage of planning an evaluation^ every program goal ahould 
ba compared with the progran deacrlptlon to ba aure that there Is enough 
reason to expact the goal to be met to Juatlfy the effort and expenae of 
the evaluation* If* for example^ a new bilingual program replaces all 
first-grade English reading with Spanish reading* there is no reason to 
expect the program to make dramatic Improvements in flrat-grada* Engllah 
reading scores* At the report-writing stage* the link between program 
featurea and Impacts must be made perfectly clear to the reader* These 
obvious principles are almoat universally Ignored In evaluatlona* and 
bilingual program evaluatlona are no exceptions* 

As with the specification o£ goals* a thorough* detailed deacrlPtlon 
of the bilingual program la highly dealrable * However* this section Is 
concerned primarily with the very baalc* rudimentary description that Is 
abaolutely essential * In addition to features dlscuaaed below* thle basic 
deacrlptlon must Include (a) the broad context of the school and community* 
(b) the coo^erlson group (or norm group) treatmnt* (c) the legsl require- 
ments affecting the program* and (d) teacher charactel'lstlca* (See also 
Section 5* Describing the Students*) 



Key Problems 

DeacrlPtlona for the uae of the project director and eval- 
uator* The project director and evaluator need to have a very 
clear picture of the project they are' operating and evaluating* 
At a minimum* they must be able to relate loq^acts to treatmenta 
In enough detail to suggeat where changes are needed and what 
changea to make* The flrat three Itema In thla aectlon are 
Intended to help In developing the treatment description that 
they require* 

Descriptions for the evsluatlon report * The knowledgeable 
readera of the evaluation report will simply not believe accounta 
of major Impacts unless a plausible explanation (l*e** a learning 
situation substantially different from* and apparently Superior 
to* tha conventional claaaroom) Is offered* This deacrlptlon of 
treatment must be clear* but need not ba aa detailed as that for 
the project director and evaluator* (See Item S^-D*) 



36 




Related Issues 



Describing variation In treatment , in most programs, the treatment 
varies for different students depending on their language skills , reading 
and math skills, and other factors. In such cases each different treat- 
ment mtiat be described, and students must be grouped for the data analyses 
sccordlng to the treatment they received. 

Longitudinal descriptions of treatments . In descrlblP'* the bilin- 
gual program, it Is essential to make clear what the p** * experiences 
throughout all of his or her yeart^ In the program. B: I programs 

often Include a coordinated curriculum for grades K-6, a, ..^e complete 
program must be described. 




3^a 



Model for Bilingual Project Description 



A description of the bilingual project Is an essential part of an 
evaluation report. The Model for Bilingual Project Description presented 
here was developed during the Bilingual PIP dissemination study and Is 
based on BHC staff expertise and the ^perlence of the bilingual PIP 
field test. Literature from the field of bilingual education was also 
examined^ and Ideas were Incorporated from similar models that have 
been developed (Mackey, 1977) as well as frLtn a wide variety of more 
general current works (see» for ^arnple^ Center for Applied Linguistics^ 
1977, 1978). 



The model Is divided Into three major areas: (I) overview, (2) In- 
struction, and (3) management. Each area con'^lsts of lists of categories 
to be considered In providing a comprehensive project description. Though 
It Is always somewhat artificial to divide an organic whole like a project 
Into a system of categot^i^s, the model is intednded to be as systematic 
and comprehensive as possible. The evaluator may utilize those sections 
of the model that are particularly appropriate to the project being 
described- 



ERLC 



39 

42 



MODEL FOR BILINGUAL PROJECT DESCRIPTION 



!• OVERVIEW OP BILIMGUAL PROJECT 
1*1 Project Summary 

1*1*1 Major Goals 

1*1*2 Target Student Population 

• Language characteristics 

• Achievement levels 

1*1*3 Grades and Humber of Classrooms Served 
1*1*4 Portion of School Day Covered 
1*2 Local Context 

1*2*1 Community Characteristics 

s Languages 

« Ethnicity 

s SES 

« Mobility 

s Size 
1*2*2 LEA Description 

s Size 

s Financial status of district 

• Facilities available for project 
1*2*3 Relevant History of LEA and Community 

• Special projects 

• Desegregation 

2* INSTRUCTIONAL APPROACH 

2*1 Content of Instruction 

2*1*1 Content Areas Covered 
2*1*2 What Determines Content 
2*1*3 Other Content Features 

• Relationship of content to goals 

• Articulation of project content vlth existing district 
curriculum 



40 



43 



ERIC 



2.2 Presentation of Content 

2.2.1 Instructional Models or Theories 

• Bilingual education model 

• Other model 

2.2 2 Methodologies for Bilingual Education 

• Language of Instruction 

— gener>)I language use plan for teacher and 
student over length of project 

— ^ dally Instructional time In each language 

— variations for different student groups 

— criteria for establishing language of 
Instruction 

• Approach to non-standard forms 

— acceptance 

— form of corrections 

• Approach to second language Instruction 

formal Instruction 

— functional use of second language for content 
Instruction and other activities 

• Approach to reading Instruction 

— language In which students learn to read 

— criteria for beginning reading In second 
language 

2.2.3 Specific Methodologies for Each Subject Area 

2.2.4 Rate 

• Variation In pace of Instruction for Individuals 
or groups 

4 Time on task 

— minutes per day per content area (see Scheduling* 
2.4) 

— proportion of time student ±3 actively engaged 
In producing responses for which s/he gets 
feedback 



41 

44 



2*2*5 Sftlf-Concept Development and Motivation (Aspects of 
program that may motivate studenta and Improve their 
self-concept) 

• Appropriate content and language of inatruction 

~ using for instruction 

~ accepting the language of the student 

content that relates to experience of studeata 

^ culturally relevant material 
m Improved affective cliaate 

*^ placing equal value on both languagefl and cultures 

^ insuring student success 

— involving parents 

<— teacher aa a role model 
e Discipline approach 

~ philosophy 

~ guidelines/control aver approach 
m Special reward systems 

» prizes, privileges 
2*2*6 Materials 

e Core materials in uae 

~ commercial 

~ locally developed 
e Appropriateneas 

« linguistic 

~ cultural 
2*2*7 Personnel Roles in Classroom 
m Teachers 
e Aides 
m Parents 
e Peers 

e Resource staff 
Student Selection 

2*3*1 Entry Criteria and Procedures 
2*3*2 Exit Criteria and Procedures 



42 




2.4 Scheduling 

2.4.1 Grouping and Regrouping 

• Acroaa claaaes 

• Within claaaea 

2.4.2 Dally Schedulea 

MANAGEMENT 

3.1 Staff Organization 

3.1.1 Llat of Staff Membera and Time Commitment 

3.1.2 Organizational Structure 

3.1 .3 Quallflcatlona 

3.1.4 Selection Procedures 

3.2 Staff Rolea (deacrlbe reaponalbllltles) 

3.2.1 Project Director 

• Style of leadership as determined by project and LEA 

• Funds and budgets 

• Public relations 

• Administration 

• Overseeing Inatructlon 

• Staff training 

• Developing and ordering materlala and equipment 

• Staff recruiting and hiring 

3.2.2 Teachera 

• Planning Inatructlon 

• Implementing Instruction 

• Non-lnstructlonal r<*sponslbllltle3 

3.2.3 Aides 

3.2.4 Other Staff 

• Instructional coordinator 

• Community coordinator 

• Evaluator 



43 



3.3 Staff Development (describe) 

3.3.1 Needs Aseesement 

3.3.2 Structure of Training 

• Fre^service 

• In-service 

3.3.3 Characterietice of Training 

• Appropriateneae for staff of differing levels of 
Icnowledge and experience 

e Practicality 

• Coordination with degree progreme 
e Integration with other training 

3.3.4 Audiences Trained 

• Project staff included 

• Inclusion of non-project staff 

3.4 Parents and Community 

3.4.1 Parent Involvement in School Affairs 

3.4.2 Community Input in Program Planning 

• Advisory group 

3.4.3 Community Support for Project 

3.4.4 Parent Education 

3.4.5 Parent Conferences/Counseling 

3 . 5 Commun ica t ion 

3.5.1 Staff Relations 

3.5.2 Relations with Non-Project Staff 

• District administrators 

• Principals 

• Non**project teachers 

• School board 

3*5.3 Dissemination of Project Infotmation 

• School personnel 

• Parents and community 



44 



47 



3-b 

Claaaroom Obaetvatlon Guide 

Reference waa made earlier to the need to eatabllah the amount of 
participation atudenta have In the program* aa well aa the type of Inatruc- 
tlon received by the partlclpanta. The evaluator muat know and deacrlbe 
what It l6 that la being evaluated. 

The following form la provided aa an example of a clasaroom analyala 
tool that can be uaed for the purpoae of documenting the qualitative and 
quantitative characterlstlca of featurea deemed eaaentlal to bilingual 
education. The value of conducting claaaroom obaervatlona will become 
apparent when attempting to analyze data and when providing feedback to 
the achool. An evaluator* aa an obaerver In a classroom* is there to 
gauge the potential for certain practlcea and characterlatlca prevloualy 
Identified aa dealrable. For example* It la agreed that to have parenta 
participating In the achool la desirable. However* an evaluator may be 
unlikely to wltnesa such stn event during a claaaroom vlalt. Therefore* 
It wottld be neceaaary for the evaluator to Interview teachera* PAC mem* 
bera^ and to aee acme documentation of parent participation. The Model 
for Bilingual Project Deacrlptlon can be uaed to conatruct the neceaaary 
Interview guldea for any of the program componenta* and therefore aupplB^ 
menta the Claaaroom Obaervatlon Guide. 

Claaaroom obaervatlon la a complex undertaking. It ahould be done 
aa frequently aa poaalble In order to obtain a reliable deacrlptlon. In 
addition* the peraon(a) conducting the obaervatlona ahould be adequately 
trained. This la a simple* rudimentary guide that can be uaed by dlatr^cta 
to develop guldea that are more applicable to their own needa. 



45 



48 



OASSROOH OBSERVATION GDIOE 



Date 

School District 

School 

Teacher 

Aide 

Gradi^ 

ObP^rver 

Duration of Observation 

Interview Required Yes Mo 



<?5 



46 



CLASSROOM OBSERVATION GUIDE 



Subject: Lesson: 

(Use one sheet per subject) 

Methodology/Theory: 

Materials: 



GROUP 


IMSTRUCTOR(S) 
AND 
ROLE 


LANGUAGE OF 
INSTRUCTION 


DURATION PER WEEK 
(In hours of exposure) 


NOTES 


Characteristics 


Size 


Teacher 


Student 


Subject 


Language 



















BASIS FOR GROUPING: 

Criteria: 

Assessment Method: 
Permanence: 



50 



51 



CLASSROOM OBSERVATION 60IDE (Continued) 



Page 2 



ComposltloQ of Classroom: 

Nuaoiber of students of lliolted English proficiency 
Degree of segregatlon/lntegratlont 



Physical/structural Layout: 



Approach to Culture and Heritage In Lessons Observed: 



Visual Displays: 



CLASSr^OOM OBSERVATION GUIDE (Continued) 



Page 3 



Nature/Tone of Interaction j 



Language Use by Teachers and Students in Non-Instructional Settings: 



&Ltident Attendance^ Turnover: 



Other Observations (Optional): 



3-c 



CATEGORY SYSTEM FOR DESCRIBING READING TREATMENT 
IN BILINGUAL PROJECTS 



Interpretation of evaluation data from bilingual projects Is not 
easy due to the complex nature of the projects as well as the variation 
In Instructional treatments across classrooms and within clasarooms. 
The more teachers provide Inatructlon to meet the particular needs of 
Individuals or different groupa of students^ the more difficult It 
becomes for the evaluator to document the Instructional treatment re- 
ceived by atudents In the program. Nevertheleaa^ the treatment must be 
described In order to aggregate and analyze data In a meaningful way 
and In order to Interpret findings adequately. 

In the area of reading^ for example^ It Is essential to know how 
much reading Instruction^ If any^ ^as received by each student In each 
language. The reader of an evaluation report should not be lead to 
assume that a reading teat score for a particular atudent Id a partlc** 
ular language represents one full year of reading Instruction In that 
language If auch Is not the case for all students. By using this 
category system^ Inappropriate aggregation and misinterpretation due to 
lack of Information can be avoided. 



The Category System can be used to provide a very basic description 
of th^ type of reading Instruction received by each student. The syatem 
was developed based on observations of a large number of bilingual pro* 
grams. It waa then pilot tested In two school dlstrlcta and refined baaed 
on uaer feedback. The eight categories have been designed to Include the 
most common Instructional situations encountered In bilingual projects. 
They Include types of reading Instruction for students of limited English 
proficiency as well as for those proficient In English. They are offered 
for Spanish/English and French /English programs^ but may be adapted tu any 
language. For a more thorough description^ the precise amount of time 
that each type of reading Inatructlon was provided to each student can be 
recorded- 



51 



ERIC 



56 



The assignment of a category label should be done by the teacher 
providing Instruction. This normally requires 20 to 30 minutes per 
teacher- Considerably more thorough Interpretation of program outcomes 
Is possible If this Information Is recorded for each year of a student's 
participation In the project. 



57 

52 



CATEGORY SYSTEM FOR DESCRIBIKG REAPING TREATMENT 
IN BILIHGUAL PROJECTS 
Spanish /English Form 



Category Label Definition 



S Dally reading In Spanish only* for entire year. 

No English reading. 

SB Dally reading in both Spanish and English for 

entire year* 

S-SB Dally reading in Spanish only* from fall to mid** 

year (sometime between December and March)* 
Dally reading In both Spanish and English from 
mld^year to end of year. 

S^E Dally reading In Spanish only* from fall to mid- 

year (sometime between December and March)* 
Spanish reading discontinued at mld*-year. Trans^ 
fer to dally reading In English only from mid- 
year until end of year. 

E Dally reading In English only* for entire year. 

No Spanish reading. 

E^ES Dally reading In English only* from fall to mld^ 

year (sometime between December and March)* Dally 
reading In both English and Spanish from mld^year 
to end of year. 

1 Reading treatment unknown. 

0 Other. Please describe* 



Instructions 

• Assign category labels to all project students (or at least to all 
students for whom reading achievement data Is col'*'^'^t'ed)* 

• If a comparison group Is used^ assign category labels to all comparison 
st\id&nts * 

• A£islgn a category label to each Individual student » even if the entire 
class receives the name reading treatmc^nt* 

• Record this Information in & column adjacent to reading achievement 
scores . 



53 



OS 



CATEGORY SYSTEM FOR DESCRIBING READING TREATMENT 
IN BILINGUAL PROJECTS 
French/English Form 



Category Lab^ Definition 

F Dally reading In French only* for entire year« 

No English reading* 

FE Sally reading In both French and English for 

entire year« 

F**FE Dally reading in French only^ from fall to mid- 

year (sometime between December and March)* 
Dally reading In both French and English from 
mld««year to end of year« 

F««E Dally reading In French onlyi from fall to mid- 

year (sometime between December and March) « 
French reading discontinued at mld-year« Trana* 
fer to dally reading In EiigUsh only from mld«« 
year until end of year« 

E Dally reading In English only* for entire year« 

No French reading* 

E-BF Dally reading In English only* from fall to mid* 

year (aometlme between December and March)« Dally 
reading In bcth EngUah and French from mld-^year 
to end of year« 

? Heading treatment unknown* 

0 Other* Please describe* 



Instructions 

• Assign category labels to a}l project atudents (or at least to aU 
students for yihom reading achievement data Is collected)* 

• If a comparison group Is used* assign category labels to all comparlaon 
students « 

• Assign a category label to each Individual student^ even If the entire 
class receives the same reading treatment* 

• Record this Information In a column adjacent to reading achievement 
scorea « 



54 



5,9 



3-d 



Format for Reporting Inatructlonal Treatment 



When reporting atudent achievement cutcomea^ a brief deacrlptlon of 
the treatment ahould accompany the data, if reaulta are reported by grade 
levels then the treatment ahould alao be deacrlbed by grade levels provid- 
ing the entire grade level received almllar Inatructlon. Report aeparately 
achievement outcomea and Inatructlonal treatments for thoae claaarooma 
where an Inatructlonal feature or characterlatlc may have atrongly Influ- 
enced achievement outcomea. For example^ If all aecond grade teachers In 
a program* except one* were bilingual* report outcomea and describe treat- 
ment aeparately for that one daaa. Another exainple; If Engllah reading 
for a thlrd**grade clasa differed from all the othera becauae thla claaa 
waa participating In a Title XV reading lab (not part of Title VXX pro- 
gram)* then this claaa becomea a aeparate unit of analyals. 

Completed claaaroom analyals guides (aee Section 3-B) are ::he beat 
aource of Information for deacrlbing inatructlonal treatment- Additional 
categorlea of toplca to be deacrlbed can be drawn from the Project Deacrlp- 
tlon Model- 

The format for deacrlbing the treatment could be a chart or a brief 
narrative. A chart la provided aa an sample of a format that can be used 
for summarizing Instructional treatment In an evaluation report. If the 
entire project'a Instructional treatment were to be described* a aeparate 
chart would be uaed for each aubject* for each language group* and for 
each grade level receiving almllar treatment. A completed chart Is pro- 
vided aa an example of a deacrlptlon of the Inatructlonal treatment for 
Spanlah reading* for limited English apeaklng students In the second 
grade. 



55 




Format for Reporting Instrttctlonal Treatment 



Spanish Reading 



(Subject) 



tor Students Limited in English Proiicieocy 

(Language Group) 



Grade Level 2nd grade 
Year l979 



No« of students receiving this type of treatment 
Uo# of claserooms represented 7 



146 



'0' 61 





Language Use by 


Instructor 


Subject Taught 


Grouping 




Sublect 


Teachers and Students 


Characteristics 


Hours ner Week 


Characteristics 


Comments 


Spanish 


Teachers; instruct 


6 of the 7 


Taught an aver- 


All Spanish dont* 


Group N of 


reading* 


entirely in Spanish* 


teachers are 


age of 6-2/3 


inant students 


146 does not 


All 




bilingual* The 


hours per veek* 


reci^ive Spanish 


include 16 


cl& es 




aide of the 7th 




reading* 


students who 


follow 




class teaches 


This includes 


entered the 


project 




this subject 


Spanish Lan- 


Reading groups 


program late 


objec- 




and is bilin- 


guage Arts* 


are formed ac- 


in the year* 


tives 




gual* 




cording to 


These 16 stu- 


for 








achievonent 


dents were 


Spanish 




All teachers and 




level* 


not pretested 


reading^ 


Students: t^artici-- 


aides have re- 






and will not 


and use 


pate almost exclu- 


ceived training 




Groups are semi*- 


be included 


sane 


sively in Spanish* 


in teaching 




permanent * 


in outcome 


basic 


English responses 


Spanish reading* 






summaries* 


texts* 


are accepted vhen 
appropriate* 








One classroom 
was excluded 
from analysis 
because bilii^ 
guaX teacher 
ms transferred 
at mid-year and 
no bilingual 
substitute was 
provided * 



;2 



ERIC 



Format for Reporting Instructional Treatment: 

for 



(Subject) 



Grade Leval 
Year 



(Language Group) 

No. of students receiving this type of treatment 
Not of classrooms represented 



Sublect 



Language Use by 
Teachers and Students 



Instructor 
Characteristics 



Subject Taught 
Hours per Week 



Grouping 
Characteristics 



Comments 



Teachers: 



Students: 



64 



CHOOSING AN EVALUATION DESIGN 



(^SECTION 



Major Content Item 

4-a. a guide to evaluation designs 



65 



4* CHO0SIK6 AH EVALUATIOK DESI6K 



Evaluation dealgn la an extremely complex field* Relatively few 
evaluation apeclallata have the neceaaary skllla to develop and tmpletoent 
a neWt apeclallzed evaluation dealgn* However* chooalim from imong exlat* 
Ingt conventional dealgna la one of the eaaleat atepa In the evaluation 
proceaa* Thla la becauae there are only a few realistic cholcea^ and* to 
a large extent* the declalona will be determined by local conditions* The 
aeemiqgly endleaa and often eaoterlc altematlvea that provide the subject 
matter for countleaa booka* artlclea and conference papera quickly evapo«« 
rate* In practice* with few exceptions* they are either (a) technically 
unsound t O) Impoaalble to Implement In a achool aettlng* (c) so complex 
that only a few experta In the country are qualified to apply them* or (d) 
aome combination of the above* 

TWo points ahould be kept In mind In order to avoid unrealistic ex- 
pectatlona for evaluation designs* First* even the best designs are not 
sensitive enough to provide a convincing demonstration of most program 
Impacts* Thla la almply because program Impacts are usually small* and 
are easily obscured by the effects of many other factors (see Appendix A)* 
However* If large Impacta are produced by a program* an appropriate* 
carefully Implananted dealgn will probably provide convincing evidence* 
Second* moat bilingual program evaluation dealgns are affected by local 
policies and conditions and by legal and funding agency regulatlona* 
In combination* thoae conatralnta may completely preclude any accurate 
aaaessment of program Impact* The only productive option for the project 
director and evaluator may be to eliminate meanlngleaa impact evaluation 
activity where regulatlona and pollclea permit* and to concentrate on 
performance-level or other potentially useful Inf osnanation* 



Key Problema 

Deciding which gueations you want to anawer * Many evaluations 
are carried out and reported with no thought aa to the exact 
questlona that are being aekad or the Impllcatlona of the answera* 
Queationa that you may wlah to aak Include; (a) Are atudenta doing 
better than they would in a conventional claaaroom? (b) Are they 
doing better than almilar atudenta in other local programa? (c) 
Are they doing better than almilar atudenta in nearby distrlcta* in 
the entire state* otr in the entire country? (d) Are they doing 
better than laat year'a program atudenta? (e) Are they doing well 
enough? (f) Laat but not leaat* haa the program Improved student 
perfosnonance? 

Deciding which queationa you can anawer . All of the above 
queationa may be of Intereat* but few evaluations snawer all of 
them equally well* Item 4*A suggeata which queationa can be 
anawered and auggea ts aome of the problems involved in anaver'- 
ing each one. 



6v; 



Related Issues 



Short*term versus lonR-term evaluations . While most bilingual 
programs cover several grades* evaluations are cften designed as If 
each grade represented an Isolated » complete program. A common example 
Is to test subject matter (e.g.» English reading) at grades prior to 
which the subject hae been Introduced. While some progress toward the 
ultimate program goals should be observable at each grade level* the 
progress may not Include all subjects at all grades and the evaluation 
design should reflect thlsi> Conversely* there Is no way to evaluate 
the total Impact of a program until students have completed the whole 
program, thus* every bilingual project evaluation should be viewed as 
a long-term effort. 

Overtestlng . In addition to the direct costs* testing places a 
great burden on teachers and students. Simply listing all of the 
subject areas of Interest and finding a test for each one will almost 
certainly lead to an unreasonable amount of testing. In general* It Is 
advisable to select oj;ily the most Important areas for special testing 
and to make use of required district tests wherever possible. 

Fall-to-spring versus twelve-month test Intervals . A major 
decision Is whether to evaluate Impact over a seven-month or a twelve- 
month period. The shorter period gives a quicker answer and reduces 
problems of student turnover. However* It also may give a misleading 
picture due to the short-term Impact of some programs* with possible 
losses over the summer, the twelve-month period Is recommended wherever 
district-policy and other factors permit* since It reduces the ,estlng 
burden and appears to provide a more meaningful picture of long-term 
program Impacts. 



61 



4-a 

A Guide to Evaluation Dfeslgns 
Outline 



This section has not been developed beyord the rough outline stage^ 
pending greement among USOE and potential u&ers as to appropriate content* 



Restricting the Questions 

The evaluation design depends on the question. 

Question 1. How well are the atudents performing? 
Question 2. Hjw effective Is the program compared to previous or 
alternative programs? 

We will not consider: 

Do students make achievement gains? (Most students make gains In most 

aubject areas due to maturation.) 
Bo students meet objectives? (Objectives are arbitrary unless they are 

bcded on past years (see ^^Tlme series/^ below) or on other groups 

of students (see ^^Comparlson groups/^ below).) 



AnsweiTlng Question 1 — Performance Level 

This Is possible In most districts » at least for subjects with English- 
language tests. 

I. Two kinds of Information are required: 

A* Backgjound /experience of nha students (See Section 5) 

A general picture of language « schooling « and home/communl ty 
background Is needed If performance levels are to have any 
meaning at all. 

B. Performance measures (See Sections 2^ 6* and 7) 

1. The skills that ai^ zieasured must be at least generally 
relevant (to the program; goals). 

2. The tedts must be reliable^ valld» and of the ^proprlate 
difficulty. 

3. Testing and scoring must be done with great care. 

II* A frame of reference Is required. Program students can be compared 
to any group of Interest. Typical comparisons are: 
A* National norms from standardized testa 
^. Local comparison groups 

C. Program studt^nts (or> If the program Is new — slmll. students) 
from previous yeara. 

III. In addition, Interprfe c atlon of performance levels requires a descrip- 
tion of the relevant Instructional treatment. 



63 

68 



Apeverlng Queetlon 2 — Program Impact 



Vary difficult or Impoaalble In many dlatrlcts. 



The question Is — What Is tha effect of tha program (separated 
from other factors)? (a) Is It an Improvement ovar What was 
being done before? or (b) Is It bettar than some other program 
of Interest. Tha "program" hers Includes procedures 
materials » and personnel . Separating tha effects of parsonriel 
from the effects of procedures and materials is an extremely 
difficult task» and Is completely beyond the scope of this 
manual. 



All of the designs listed below assume test data and background/ 
experience data of the highest quality. Preexisting data 
from district files r'xe generally not acceptable. 

I* Classes of designs using single comparisons. 

A. Horm^referenced design 

Comparison: national norma from standardized tests. 
Measure t Gain In standard score (eg.» nCEa) from 

pretest to posttest. 
Problem! Considerable variance among schools (sd « 5 

to 10 KCEs) In normal yearly gains or losses. 
Credlbllltyt Low. 

B* Comparlson-'group deslRn (non-random assignment) 

Comparison! Students In the same district. Both the 
comparison students and their Instruction 
differ from the bilingual program students. 
Intended to answer question "b^** above. 
Measure: Program-student gain compared to comparison- 
student gain. Pretest differences adjusted 
by prlnclple-axls adjustment (Section 8)* 
ttiltlple regression approach is a possible 
improv^ent. 

Problem: Differences In students^ rather than In the 

program^ may produce differences In gains. 
Credibility: Low. 



C Tlme-serliSS design 

Comparison: Galna of this year's students aje compared to 

galna of past year's comparabls students. 

Intended to answer question *'ai** above. 
Measure; Program^student gain compared to gain of similar 

group from preceding yeara (at least two years 

are needed)* Pretest dlffc>Tences adjusted by 

principle axis adjustment (Section 8)* Multiple 

regression not needed, 
•uulem: Changes In school or community may produce 

differences In gains. 
Credibility; Low* 



64 



69 



II. Recommended design using a combination of comparisons. 



For 3 credible impact evaluation* all three of the above designs 
should be combined. Data are required on prog ram- type students 
plus addltlo al representative groups from the district for several 
years prior to the start of the new bilingual program. Data on 
all , of the groups should be obtained for several years after the 
Initiation of the new program. This design answers both questions 
"a" and "b/' above* as well as providing Information on whether 
the new program Is improving over years. 

Norms provide a common metric for comparing groups. 

Time series data show changes In the performance of program-type 
students. 

Comparison Groups show that the change Is not '^'^^ to school-wide 
factors- 



Remaining problems: 

Shifts In character^^ 'vies of program-type students. 
Artificial depression of pretest scores* or Inflation of 
posttest scores for program students. 



65 




DESCRIBING STUDENT SKILLS FOR SELECTION 
AND DATA ANALYSIS 



Major Content Item 

5-A. DEMOGRAPHIC AND BIOGRAPHIC INFORMATION WORKSHEET 



5. DESCRIBIHG STUDENT SKILLS FOR SELECTIOH 
AND DATA ANALYSIS 



An accurate description of each student's skills Is essential— firsts 
for selecting program students and later* for organizing performance- level 
and Impact data at the analysis stage. Of course* a clear picture of stu-^ 
dent skills should also guide the Instructional program* although Inatruc*^ 
tlonal planning Is beyond the scope of this manual. 

The absolute minimum Information on each student must Include akllla 
b oth relevant languages as well as skills In the program's major subject 
areaitff A truly adequate description will also Include Information on the 
student's learning background and current envlronaentf The later categor* 
lea of In formation are especially crucial at the data analysis stage* 
since they play a major role In determining what can be expectad of each 
student. For example, given a student with a low English reading pretest 
^core* we might expect much greater Improvement If the student were a high 
SES new arrival with no previous training In English reading than we would 
If the student were from a low SES background and had beeft In high-quality 
bilingual programs for several years. Thus* atudentfl must be grouped 
according to both current skills snd past experience If meaningful data 
;:Lnalyses are to be conducted. 

In writing evaluation r^orts* an accurate description of student 
background and skills is also essential. Few bilingual-program evaluation 
reports provide enough Information for the reader to make any judgment as 
to the credibility or Importance o£ the results. 



Kfijy Problems 

Describing skills In two lanftuages . The problems o£ measuring 
language skills Is discussed In Section 6. The problem of concern 
here Is simply that many evaluations Ignore target-language skills^ 
Selection* Instruction* and data analysis are often based only on 
the fact that students hsve limited English skills. In some proj- 
ects* the Implicit assumption of superior skills In the target 
language Is entirely justified. In many of the projects observed 
by RHC* however* target language skills were even lower than 
English skills* sometimes substantially so. If such situations are 
not made clear In the evaluation report* the results become com- 
pletely misleading. 

Describing student backgrounds . A clear picture of the 
program students' environment and learning history are also 
essential to the accurate understanding of Impact evaluation 
results. Apparently few projects collect this Informstlon for 
Impact evaluation purposes* and even fewer present s systematic 
treatment of this Information In their evaluation reports » 
While this manual does not provide a complete guide to the appro- 
priate use of such Information* the Biographic snd Demographic 
Worksheet In this section should provide s starting point. (See 
also; Data Analysis, Section 8.) 



68 




Related Issues 



Cotablnlng measures Into a single^ selection score . Selection for 
a bilingual program may be based on student background * categories * 
language test scores » achievement test scores » and teacher judgment. 
If the school district Is willing to quantify all of these measures 
(Including teacher judgment)* arrive at s single score* and use this 
score as the sole basis for assigning students to the program* then 
statistical corrections to achievement gains become possible* and the 
accuracy of the achievement Impsct evaluation may be considerably Im- 
proved « 

Longitudinal student profiles . Since most bilingual programs span 
several grade levels* the value of student descriptions Is Increased 
greatly by creating longitudinal student profiles. Since most schools 
keep permanent student record files* It may only require minor addi- 
tions to ensure that the appropriate background (and treatment) Infor- 
mation Is readily available for each program students 

Selecting and describing students who are proficient In English . 
Many programs Include substantial numbers of monolingual* native- 
English speakers and blllnguals who are highly proficient In English* 
For these students* It Is not necessary to maintain the same amount of 
Information on English- language experience* Of course* these students 
must be analyzed separately from those who are learning English as a 
second language. 



6^1 

73 



5-a 



Demographic and Biographic Information Worksheet 

This demographic and biographic Information worksheet can 
be used to document Information which uill enable ^he evaluator 
to Interpret results uith a higher degree of accuracy* The In- 
formation gathered can also be used for individual pupil records 
so as to facilitate continuity o£ Instruction across the years« 

The demographic Information should be collected per school, 
whereas the biographic Information must be collected for each 
student* Similar Information should be collected for control or 
comparison students, If the f^aluatlon design Incorporates a 
control or comparison group* 



School Year 
School District 
School(s) 



Information Collected by 
Date 




Demographic and Biographic Information Worksheet 
(Check appropriate answer In margin) 



I# School/Coimnunlty Character is tics 

A« These Bchool/communlt/ characteristics apply tot 

(Use one form per group «) A« 

1) treatment students 1)^ 

2) comparison students 2)^ 

B# How Is prefect student defined? 



C« What percentage of the students In the project 
school(s) are In the Title VXX project? 
(List by school name or code; put percentage In 
margin*) 

School name or code C« 

1) 1) 

2) 2) 

3) 3) 

4) 4) 

5) 5) 

D« Do(es) the school(s) participate In the free 
lunch program? (write yes or no in margin) 
School 

1) 1) 

2) ^ 2) 

3) 3) 

4) 4) 

5) ^ _ 5) 

E« What criteria aie used to determine a student's 
English language proficiency classification (e«g«^ 
LEF/LESA)? (Check appropriate answer in margin*) E« 

1) Teacher judgment 1) 

2) Language proficiency test 2) 
Test* 



Cutoff t 



3) Achievement test 3) 

Tes 1 1 ^ 

Cutoff t 



4) Combination of the above (specify) A) 

5) Other (explain) 5) 

At the time of fall testing^ what percentage of 
the students in the project were claesified as 
limited in English proficiency? F# 



72 



G. What Is the size of the district In which the 
project Is located, (Check appropriate answer In 

margin*) G* 

1) 12*000 or more students 1) 

2) 3*000 to 11*999 2) 

3) 1*000 to 2*999 3) 

4) less than 1000 4) 

H* In' what type of community Is the project located? 

(Check appropriate answer In margin.) H* 

1) Metropolitan 1) 

2) Urban 2) 

3) Suburban 3) 

4) Kural 4) 

I* What percentage of the community Is Hispanic 
(or of the ethnic group being served by the 

program)? I. 



73 



76 



Pupil Characterlstlca (Uae one form per student) 



A* For record -keeping purposeBt yihat Is thla pupll'a: 

1) Code number 

2) Age (aa of fall of 19L_) 

3) Ethnicity 

4) Language claaalf Icatlon; 

a) Limited In English proficiency 

b) Proficient In Engllah 

B. Which group doea thla pupil belong to? 
(Check appropriate anawer In margin.) 

1) Treatment 

2) C(»aparlaon 

C* In what country waa the atudent born? 
(Check appropriate anawer In margin.) 

1) United Statea 

2) Spanlah apesklng country (or country where 
target language Is apoken) 

3) Other (Plesse specify) 

4) Unknown 

0. To the best of your knowledge how long hss the 

student been In the V.S.t as of fsll 19 (current 

yesr)? 

1) Less thsn one yesr 

2) One to two yesrs 

3) More thsn two yesrs 

£• Whst Is the number of yesrs of schooling this 

student hss completed outside the U.S.? 

1) Yesrs of schooling outside U.S. 

2) Don't know 

3) N/A 

F. Whst Isngusge Is most frequently spoken to the 
student St home? 

1) Spsnlsh (or non*Engllsh program Isngusge) 

2) English 

3) Equsl use of two Isngusges 

4) Other 



G. Whst Isngusge does the student use most frequently 
St home? 

1) Spsnlsh (or non-English program Isngusge) 

2) English 

3) Equsl use of twr Isngusges 

4) Other 

H* How was the information In Item E. obtslned? 

1) Psrent survey 

2) Self report (child) 

3) Tescher/stsff judgment 

4) Other 

74 



77 



I. How was the Information in Item F. obtained? I* 

1) Parent survey 1) 

2) Self report (child) 2) 

3) Teacher/staff judgment 3) 

4) Other 4) 

J- In which of the following programs is the student 

currently participating? (Please check*) J> 

1) Free lunch program 1) 

2) Title I 2) 

3) Migrant 3) 

4) ESAA 4) 

5) Oth-ir (Please specify.) 
5) 

K. Indicate the number of years this student has 
participated in bilingual education programs, 

prior to current year. K. 

L. How would ycu characterize this student's 

aihsentee rate? L. 

1) seldom » approximately day(8)/month 1) 

2) average approximately day(s)/month 2) 

3) frequent " approximately day(s)/month 3) 

M* What percentage of the children in the student's 
classroom of instruction (at the time of fall 
testing) were considered students of limited 

English proficip,ncy? M* 



75 

73 



SELECTING TESTS 



Content Items ^section 

6-A. SELECTING AN ACHIEVEMENT TEST 

6-B, SELECTING A UNGUAGE PROFICIENCY TEST 

6-C, A SYSTEM FOR COMPARING CURRICULUM CONTENT WITH THE 
CONTENT OF CTBS SPANISH AND ENGLISH^ FORMS B AND C 



7,9 



6* SELECTING TESTS 



Although catalogues of testa llet thousands of tltlea* tha selection 
of tests for a bilingual-program evaluation actually Itnrolvea few real 
choices* This Is because (a) federal^ statfit local regulations largely 
det&rmlnd the subject areas to be tested and often the pool of acceptable 
tests as well» and (b) only s handful of the available tests ttset ttlnlmua 
technical requlrementa * While satlsfsctory tests are avsllable for basic 
subject ares6« the perfect test does not ^lst« and searching for such 
testa among ths more obscure titles Is an expenalve exercise in futility* 

The major concern in selecting teats Is to be sure that all major 
program goals are covered (l*e*« all major subjects^ at all relevant 
grades* and* ^ere warranted* In both languages)* Tests must meet reason* 
able standards 'of rellsblllty snd validity* It Is also advlssbls to 
check the technical manual to see that the test publisher has employed 
procedures designed to reduce culture snd linguistic bias* 



Key Problems 

Matching the testa to the program * The minimum matching 
requlrementa are simply to test the subjects that are Included In 
the bilingual program and not to test specific subject matter 
before It has been Introduced* Following these two single and 
obvious rules would drastically Improve many evaluations* A more 
thorough matching process la advlssble* snd Is sddressed In Item 
6^C* 

Lsnguage testa for selection^ diagnosis* and Impact evalua- 
tion * The best waya to design selection snd dlsgnostlc testa sre 
still highly controversial subjects among language test developers^ 
and there sre problems with all such tests that ars currently 
available* £Ven greater problems arise when using selection or 
diagnostic tests to measure language Improvement* These problems 
are noted In Item 6^B* 



78 




Other Issues 



Selecting tests with non*Engllsh language norms . Basically^ non- 
English language norma adequate for Impact evaluation do not exist. The 
Inter-American Testa provide user-norms based on students In bilingual 
programs using that test. The norms provided with the Spanlah CT6S do 
not represent the population of Spanlah/Engllah bilingual atudents. Norma 
for both tests can only provide a posalble standard for performance-level 
evaluations* (See Item 6-A.) 

Test level (floor and celling wffects) . In some bilingual programs, 
the at-grade^level test Is too difficult for program students at pretest 
The next lower level may be too easy at posttest time. If the mean acore 
on a test Is less than 25 percent of the Items correct or more than 75 
percent of the Items correct, floor or celling effects probably exist. 
See Item 6-A. 

Longitudinal and multl-grade-level requirements ^ Moat bilingual 
programa offer several grade levela. Therefore, It Is desirable to have 
achievement tests that can be compared across grades and that can be used 
to follow groups of students as they progress through the grades. In 
practice, this meana using me of the well known achievement tests from 
the major publishing companies. See Item 6-A. 

Crlterlon^ref erenced tests (CRTs) . In recent years, CRTs have been 
advocated widely as a solution to the many problems of standardized (norm- 
referenced) tests. In fact, the advocates of CRTs have not solved the 
problems. They have merely attempted to avoid them by asking different 
kinds of eval)*atlon questions. Where the basic question concema program 
Impact, the reliability and validity of the tests are cf primary Impor- 
tance. Rephrasing the Impact queatlons In CRT terminology simply helps 
to obscure or Ignore the fundamental reliability and validity problems. 
Although, In principle, CRTs can be just as reliable and valid aa norm- 
referenced tests (In fact, a single test can be both norm and criterion 
referenced) In practice, CRTs often lack reliability and are likely to 
reduce the accuracy of an Impact evlauatlon. 

L anguage of testing . There are no definitive guidelines as to which 
language should be used for testing subjects other than language (I.e., 
math, science, culture). If students are very weak In one language. It 
seems obvlou6 that that ' -inguage Is Inappropriate for testing. Some PIP 
fleld-'test sites In which studenta were reaaonably skilled In both lan- 
guages tested math In both languages. In these sites, the language of 
testing had little effect on acorea. In general, the language of testing 
should be determined after considering the goals of the program and the 
language of Instruction, as well as the language proficiency of the stu- 
dents. 

Must English and non-English language tests come from the aame pub - 
lisher? This question applies mainly to Spanish-English programs, since 
few tests are available In other languages. Wlule there are some advan- 
tages to dealing with a single test publisher, it is more Important to 
get the most appropriate tests In each language. Limiting choices to 
tests that are published In two languages Is an unnecessary restrictions. 



79 




ERIC. 



6-a 

Selecting an Achievement Te st 

In selecting achievement tests for the evaluation of bilingual pro" 
grams, evaluators must consider all th^, same criteria that are used In 
selecting any achlevemen*" test as well as additional criteria that relate 
to the nature of the progi^am and the student population. This discussion 
will give most emphasis to Issues In test selection that are especially 
Important for bilingual education evaluations. 

Test Bias 

During the last ten years extensive attention has been given to the 
effects of test bias for culturally different populations (Wargo, 1975; 
Houts, 1977). As a result, teat publishers have made concerted efforts 
In this area and many standardized achievement tests have been revised* 
The technical manual of a test wtll often Include a discussion of what 
procedures were undertaken to minimize bias, xhe two most common proce*- 
dures are: (1) review of the content of the Items by a culturally sensi- 
tive panel and (2) statistical Item analysis . 

Review of content . Reading and examining the content of Items may 
result in rewriting items so that they seem fairer to all groups Involved. 
However a visual examination alone cannot determine if an Item Is biased, 
l*e*, that It will function differently for different groups of students. 
What can be £\compllshed Is the elimination of stereotypclal wording or 
content* External review panels have the advantage of Insuring a disin- 
terested reading, although In-house groups may also be effective. This 
procedure may result In a more acceptable test, bi-c will not necessarily 
eliminate bias^s^l Items. 

Item analysis. Itew analysis Is a statistical procedure that Is 
performed routinely In test construction. The scores of students on each 
Item ave compared to their scores <^n the whole test In order to determine 
If each Item Is measuring what the whole test measures, and in fact should 
be part of that test* When this procedure used to eliminate bias towards 



81 



''2 



a apeclC*c group* the teat Is admlnlatered to both the general population 
and to the apeclflc group. Then Item analyala la perfosnned In order to 
detesnonlne that the aame jtema function cjioilarly for both groups, ^or 
example^ If Item 1 la difficult for one gr^up It ahruld be difficult for 
the other regardleaa of the mean teat acorea for each * oup. If an Item 
la eaay for one gtoup but difficult for another* then auch an Item eyhlblta 
bla&t and ahould probably be eliminated* 

Additional Selection laauea 

Conalderatlon of aubteat content and weight In acorlng Is Important 
for aelectlng the test that moat cloaely matchea the curriculum and for 
determining whether In-level teatlng Is appropriate > Such lasues are Im- 
portant for all 3tudentat but they may be even more critical for atudenta 
of limited Engllah proficiency. Although the curriculum of bilingual 
programs may contain the aame fln^l objectlvea* akllla auch as Engllah 
reading may not be taught In the same grade levels aa otU>^r programs. 

The wording of the Inatructlona to the teat should be conaldered* 
The language of the Inatructlona ahould not be more difficult thsn the 
language uaed In the It^a that actually appear in the teat* Although 
dlrectlona containing needleaaly complex aentsnce atructure are a handicap 
for all atudentSt they will cause an even greater difficulty for atudenta 
of limited Engllah proficiency* Examiners may want to consider ayatemad- 
cally almpllfylng teat dlrectlonst but If ncrma are to be uaed* thla may 
affect their val^.dlty. 

Additionally* the content of the test ahould be examined to determine 
the extent to which It teata the out-of-achool experience of the children. 
The experience of the culturally different child and of the low SES child 
may differ algnlf Icantly from that aaaumed by the authora of the teat* 
Therefore* the more the teat rellea on out-of ^^achool experience* the more 
It may dlacrlmlnace agalnat the target population and the leap, valid It 
will be for evaluating program Impact* 



82 




Finally, If bilingual tests are used, the nature of the translation 
should be considered. Some tests are direct translations except where 
such a translation wo::ld clearly be Impossible. Other tests provide 
equivalent versions where the kinds of Items and the difficulty level are 
roughly equivalent, but the content of the Item may be completely different. 
Other tests are a combination of both methods. In a translated test, the 
difficulty level may not be the same for both versions. However, very few 
test publishers provide equivalent versions. 

Language of Testing 

In many bilingual education evaluations, the evaluator must decide 
In what language to test. Several questions have to be considered Indl** 
vldually and In relation to each other. First, what Is the language of 
instruction for the subject that will be tested? Because the language 
of Instruction for math, fot example, may be different for students In 
the same class or may be different at various times during the year, this 
question may not be answered simply. Second, what Is the dominant Isn** 
guage of the child as established by a systematic assessment procedure? 
Third, what are the project gonls? Conls mcy rsquire testing in a par- 
ticular language. Ideally, of course, students should tested in the 
language in which they will do best. However, that language may not 
always be the dominant one. For example, a student may be more fluent 
in Spanish, but if almost all math instruction has been in English, the 
student may perform better on an Euc^lish test. 

""<ere are othar la^aes Involved in planning testing in more than one 
language that have not yet been studied in sufficient detail. Some eval- 
uators double test the project jtudents, avoiding the choice of test lan- 
guage by testing in both languages. The benefits of this practice are 
cleat: more Information is obtained about the students' proficiency in 
content and language and the dangers of testing only in the weaker lan- 
guage are avoided. However, the additional expense, the added burden on 
teachers and students, and the possibility of practice effects iBpresent 
significant disadvantages. In addition, the language of sonc^ * udents may 
be neither standard English nor standard Spanish. 



83 




Where tes'Js exist In two languages, Spanish may be the most approprl** 
ate language for the pretest; but^ after a year of English InBtructloR^ 
English n^y be the oost appropriate Isnguage for the poattest. Longitudinal 
atudles wlli almost certainly include scores In both languages reported at 
different stages of a atudent s progresa. Evaluatora will have to consider 
carefully the Interpretations of such scoria. 

Li J'^s to the Usefulness of Norms 

The* use of national norms as a comparison standard In an evaluation 
relies on the validity of a principle known &6 the equlpercentile sssump* 
tlon. This assumption Implies that In the absence of any apeclal Instruc** 
tlonal treatment students In tile project would have grown at a rate com** 
parable to thst of students In the normlng simple who obtained the aame 
mean pretest value. Suzti an asaumptlon csn only be valid If the project 
population Is similar In educationally relevant ways to the population 
represented In the normlag sample. This Ic not usually the case In bl~ 
lingual education programs which are generally comprised of atuJentt of 
limited English proficiency^ bilingual students^ and a Isrger proportion 
of low SES students than Is found In the general population. While the 
accurac^ of the equlpercentile assumption for such populations haa not 
yet Deen systematically assessed^ It Is unlikely that norms for English 
achievement tests can provide precise no-treatment expectatlona for bl*- 
lingual project students. There are uo atatiatlcal technlquea to adjust 
for differences In expected grcvCh between the project studenta and the 
normlng population (Tallmadge^ 1976)- 

Recently data have befin gathered on Spanish language achievement 
tests. The most recent e'fitlons of the Comprehensive Tests of Basic Skllla 
(CTIiS) and thf^ Inter- American Series both furalsh norms ttbles for English 
and Spanish versions of their tests » but the manner In which such normlng 
data were compiled llmlte their usefulaens for evaluating the Impact of 
bilingual projects. The CTB3 Espanol norma were developed by adiplrlster- 
Ing the CTBS In both languug' i to a balanced blllngusl^ blllterate popula- 
tion as detesnmlned by scores on the SERVS test. The assumption was made 



8A 




that a student's standing In the norms would be the same In English and 
Spanish. Students' scores In Spanish were then equated with their rank 
In the English norms. Although the assumption that a perfectly bilingual 
person will possess the same knotfledge of content In two laiguages Is 
logical, the possibilities for error are so large that the Spanish norm 
conversions can provide only very rough estimates of student achievement. 
There are several other reasons uhy the CTBS norms cannot be used to pro^ 
vide a precise estimate of project Impact. Because the scores In the norms 
table are extrapolated rather than derived empirically, they are subject to 
a certain amount error Inherent In any estimation procedure. In addition, 
the balanced bilingual population In the sample Is not comparable to the 
population of most bilingual programs which Include students with a range 
of language proficiencies. Finally, because the students In the sample 
were In bilingual programs they do not provide an estimate of how similar 
students would have performed without any special Instruction. 

The Inter^'Amerlcan norm's were not constructed from a national prob^* 
ability sample. Ih-^y are "user norms'* derived only from those groups In 
the population to whom the Inter-Amerlcan tests were administered In the 
course of local evaluations. For certain tests, the sample obtained in 
this way numbers over a thousand students, but for others the N Is less 
than 100, severely limiting the reliability of normative data, particularly 
In the extreme score ranges where estimates are based on relatively few 
cases. Because the normlng group was not specifically constructed to 
represent the population of limited English and bilingual students, un- 
known biases may exist In the sample. Because students in the sample are 
also In bilingual programs, the norms do not provide an estimate of how 
similar students would have performed In the absence of a special program. 

The question of how a group of students would have performed without 
a bilingual project cannot be answered by simply consulting currently 
available norms. But existing norms can be used to answer other evalua- 
tion questions. Well constructed norms based on national probability 
samples, *iuch as those provided by the major achievement tests, can be 
used to show tiow the bilingual project students compare to national aver- 
ages. Norms based on more specific populations, such as those constructed 



for the Spanish veralona of the CTSS nnd the Inter* American^ can be uaed 
to ahow how project atudenta compare to the bllln^u^ illiterate CTBS 
aample or the bilingual project atudenta In th^ Intel ^American sample. 

Out"of "level teatlng . The uae of teata at levels^ be^ow thoive 
recommended by the publlaher la an option If the content of th6 program 
can be meaaurea better thla way. Studenta In bilingual programt may be 
learning akllls^ auch aa Engllah reading^ at a later time than other atu^ 
denta and therefore* ahould receive the same teat at a later point. In 
order for any teat to be aultable^ the average acore of the group teated 
ahould be between 1/3 and 3/4 of the maximum (Roberta^ 1976). Otherwlte^ 
celling or floor effecta depreaa estlmatea of atudent gains. Some pub* 
llahera provide norma for the admlnlatratlon of a alngle teat In aeveral 
gradea. Other publlahera provide expanded atandard scorea that link up 
all levela of a teat on a common acale* and occaaslonally* locator teata^ 
to facilitate out^of^level teatlng. Generally^ a teat ahould be uaed no 
more than ov?e level below that recommended by the puMlaher. But care 
ahould be taken that In teatlng out-of-ievel, preteat floor effecta are 
not being replaced by poattest celling effecta. 

Introduction to teat list . An extraordinary number of teeta could 
be uaed to evaluate baalc subject areaa for bilingual programa. Some of 
theae teata are locally developed and have not been admlnlatered to large 
aamples of the population. Therefore^ ^hey are leaa likely to have the 
technical qualltlea required by moat evaluatota. Other teata are limited 
to only one content area^ and cannot be uaed by themaelvea to evaluate a 
bilingual project which Includes aeveral content areaa. Finally^ many 
evaluatora will flrat conalder the approprlateneaa of teata already In uae 
In the dlatrlct for the evaluation of the bilingual program. Certain 
teats may be mandated or cholcea may be conatralned In oth^^r waya. Selec- 
tion of a teat already being uaed for dlatrlct-wlde aaseeamenf: Intorducea 
the poaalblllty of comparlaon with local non-project atudenta. Thla com- 
parlaon alone cannot provide a preclae eatlmate of project Impact^ but may 
anawer other evaluation queatlons^ such aa how project students compare 
In achievement level and r^te of growth to other atudenta In the dlatrlct. 



86 




The annotated teat Hat wtilch follows la an attempt to provide help- 
ful information a^out tests that^ for the reasons diacuased above^ are 
already likely to be under consideration by project evaluators. 

Only major teata of achievement that include both math and reading 
or language aubtesta were considered. All auch teata available in cwo 
languagea were included. Tests only available in Engliah were limited to 
thoae included in the Anchor Teat Study (Loret^ 1974). Finally^ all of 
the testa were diacuaaed only aa they apply to evaluationa of grades K-6. 

The same categoriea of information are provided for each teat to 
facilitate compariaon. All of the teats are available from major pub- 
liahers. Technical aspecta of such teats are likely to be as good as the 
state-of-ths-art. All of the testa have technical manuals deacribing the 
procesa of test conatruction and standardization. Except for an occasional 
subteat^ all of the teata are designed to be adminiatered in groups. Ad- 
miniatration time for each test varies according to the nuni)er of aubteata 
used. Subtesta are listed only ^ere they contribute to a total acore in 
reading^ languge arts^ or mathematics^ three major areaa of interest to 
bilingual program evaluation. 



87 




REFERENCES 



Hoepfner* R. Achievement teat selection for program evaluation* In 
In Wargo» M. J. and Green, 0. R. (ed.)* Achievement testlnR of dls* 
advantaged and minority etudente for educational program evaluation . 
CIB/McGraw Hill, 1977* 

Houte, p, The myth of meaeurablllty * New York! Hart Publlehlng 
Conqpany, Inc*, 1977. 

Loret, p. 6*, et nl* Anchor teet etudy * Waahlngtont U*S. Government 
Printing Office, rJ74. 

Rhodee-Hoover, M., Pollt^er, R. L.* & Taylor^ 0. Blae In achievement 
and dlaftnoetlc reading teete: A llnfialetlcally; oriented view . Un- 
published manuscript^ St«;nford University^ 1975. 

Roberts, A. 0. H. Out»of»level testing . Mountain View, CA; RMC Re- 
search Corporation, 1978* 

Tallmadge, G* K. Cautions to evaluators. in Wargo, H. J. and Green, 
0. R* (ed.). Achievement testing of disadvantaged and minority 
students for educational program evaluation . CTB/McGraw Hill, 
1977. 

Wargo, M. J., & Green, D. R. Achievement testing of disadvantaged and 
minority students for program evaluation . CTB/HcGraw Hill, 1977. 




California Achievement Test, 1977-78 
Forms C and D 

Languages: English 

Publisher's recommended In-level use; 



Level 


Grade 


Level 


10 


K.0-K.9 


Level 


U 


K.6-1.9 


Level 


12 


1.6-2.9 


Level 


13 


2,6-3,9 


Level 


14 


3,5-4.9 


Level 


15 


4.5-5.9 


Level 


16 


5.5-6.9 



Subtest Components; 

Pre-reading 

Llstftnlng for Information 

Letter Forms 

Letter l^ames 

Letter Sounds 

Visual Discrimination 

Sound Matching 
Reading 

Vocabulary 

Coioprebenslo.i 

Phonic Analysis 

Structural Analysis 
Language Total 

Language Mechanics 

Language Expression 
Mathematics Total 

Computation 

Concepts and Applications 



Level; 10 11 12 

X 
X 
X 
X 
X 
X 



.4 15 16 



X 
X 
X 



X 
X 

X 
X 



X 
X 
X 
X 

X 
X 

X 
X 



X 
X 
X 
X 

X 
X 

X 
X 



X 
X 



X 
X 

X 
X 



X 
X 



X 
X 

X 
X 



X 
X 



X 
X 



Normlng: Weeks rather than midpoint, dates are provided for empirical 
fall and spring norms. These are the week In which November 3td falls, 
and the week In which May 4th falls. Tests can be administered two 
weeks on either side of these weeks without the use of interpolated 
norms . 

Out-of-ievel testing; Provides an expanded standard score scale and 
a locator test. 

Procedures for minimizing bias; Test writers followed guidelines to 
avoid bias In the development and editing of Items. Items were re- 
viewed by representatives of various ethnic and cultural groups. An 
extensive Item analysis was conducttsd with the cryout Items to compare 
rasponses of "Black" students and "other" students. A point blserlal 
correlation was used to show the relation of items to category objec- 
tive scores, and grade-to-grade growth as shown by item difficulties 
was also examined. The percent of biased items found In the trial 
It^ms for the. various subject areas ranged from 25 to 7 percent. 
After revision the percent of biased Items was reduced to the 3-0 
percent range. 



89 

90 



CIRCUS 
1976 



1 * Languages ; English 

2* Publlflher'a recommended In^level use; 



Leve7 
Circus A 
ClrcuB B 

Circus C 

Circus D 

Subteftts;'' 



Gra de 

Nursery School and Kindergarten ^ Fall 

Kindergarten ^ Spring 

First Grade * Fall 

First Grade « Spring 

Second Grade ^ Fall 

Second Grade - Spring 

Third Grade * Fall 



Level 



Pre-reading 
Reading 

Listen to the Story 
Listening 

How Much and How Many 
Mathematics 
Writing Skills 



X X 
X 



^Uany other subtests ar: provided « but only these that coordl* 
nate With the STEP are listed here* No total scores are possible 
from any combination of subtests* 

The subtests listed above provide coordination through content and 
expanded standard scores with the following subtests of STEP III* 
Level E-'J; Reading^ Listening^ Math Concepts and Hath Computation^ 
and Writing Skills* 

4* Normlng; The Circus was administered to a national probability 
sample during the fall (October) only* Therefore^ the comparison 
of a group to the national sample for pre* and posttestlng can be 
done for a fall-to-fall evaluation design only* Information Is also 
provided In s«^ntence form describing what each range of acores means 
In terms of skilla mastered* A fall to spring comparison of the pro-^ 
portion of students falling In each category could be made^ but would 
require the use of a local comparlaon group to determine the normal*' 
growth expectation* Separate tables exist for comparing groups and 
for comparing Individuals* The normative data Is very well suited 
to Individual student evaluation because the national sample Is di- 
vided Into subgroups such as sex^ geographic region « and SES* 

5* Out-of-level testing; Expanded standard scores can be used for 
subtests that coordinate with STEP III* 

6* Procedures for minimizing blast No statlatlcal procedures are re- 
ported* Separate norms are provided according to citegorles such 
as sex, geographic region » end SES* 



90 

91 



EL CIRCO 
1979 



1* Languages: Spanish and English 

Spanish tests allow the test administrator to select 
among alternatives the word most appropriate for the 
students' variety of Spanish. 

2* Publisher's reconunended In-level use: Tests can be used at pre- 
school, kindergarten, and beginning of first grade. 

3. Subtests:* 

Cuanto y Cuantos 

Para Qu& Slrven Las Palabras 

What Words are For 

Quanto y Quantos Is a direCL translation of Level A of How Much and 
How Many of CIRCUS. Para Que Slrven Las Palabras and What Words are 
For are equivalent, but one Is not a translation of the other. For 
example, each test has Items testing comprehension of the past tense 
but the Items will have a different content. 

4* Nomlng: The El Clrco measures were administered to a nationwide 

sample of children from the Spanish-speaking cultural groups. Empir- 
ical norms exist for fall only. 

5* Out-of-level testing: Separate norms exist for preschool, kindergarten^ 
and first grade. 

6* Procedures for minimizing blas^ Items were reviewed by a cultural 
advisory committee composed of speakers of Puerto Rlcan, Mexican, and 
Cuban Spanish. 



^Several tests have been developed as part of El Clrco, but only the 
ones listed are available for spring 1980. 



91 

92 



Comprehensive Test of Basic Skills 
English Version 1973, Spanish Version 1978 
Form S 

1« Languages; English and Spanish 

The CTBS/EspaBol Is a direct translation of the English CTBS/S with 
the f'xceptlOQ of certain Items which could not be translated or which 
required different translations for dialects of Spanish. Xn such 
casea equivalent Items have been constructed* 

2« Publisher's recommended In level use: 

English CTBS/S CTBS/Espaaol 

Level B Grades K«6-l«9 Grade 1 

Level C Grades l«6-2«9 Grade 2 

Level 1 Grades 2«5-4.9 Grades 3 and 4 

Level 2 Grades 4«5*6.9 Grades 5 and 6 

3v £>ubtest components: 

Level 



Component B C 1 2 

Reading 

Word Recognition X 
Reading Vocabulary XXX 

Reading Comprehension X X X X 
Mathematics 

Math Computations X X X X 

Concepts & Applications X X X X 

4. llormlng: Tne norms for the Spanish version of the CTBS were derived 
through a spring equating with the nationally representative English 
language norms« The no-treatment expectation obtained by their use 
la not referenced to a Limited English Proficiency population but 
rather to the English language performance that could be expected 
from the blllngual/b lilt era te population on whom the equating was 
done. The scoring patterns In both English and Spanish for limited 
English proficiency students may be quite different; therefore* tha 
norms do not present a precise standard of comparison. Empirical 
norms exist for the English CTBS for spring for grades 2*6t and for 
fall and spring for grades K and 1« 

5« Out-of-level testing; An expanded standard score scale Is available 
for the CTBS/S norms« 

6« Procedures for minimizing bias: Prior to standardization Items 

were reviewed by Black and Spanlsh*-3 peaking consultants* In addition* 
tilnl Items were administered to a sample of Black students and *'other** 
students* Items with a polnt-blsarlal coefficient of less than «2 
were rejected* A subaequent analysis was made of the test results of 
Black atudentst Spanlah-apeaklng students* and other students* Al~ 
though the mean scores were lower for the Black and Spanlsh-spaaklng 
groupt the testa appeared to be functioning similarly for both groups* 



92 

9,9 



Inter-American Series: Test of Reading, 1962-69 
Forms CE, DE, CEs, DEs 



1. 



Languages; English, Spanish, and French 

Spanish version Is an exact translation of English version. 



Publisher's recommended In level use; 



Level 1 
Level 2 
Level 3 



Grade 1.5-2.5 
Grade 2.5-3.9 
Grades 4,5,6 



3. 



Subtest components; 



Level 



Components 




Vocabulary 

Comprehension 

Level of Comprehension 

Speed of Comprehension 



X 
X 



X 
X 



X 
X 



4. Normlng: The Inter-American norms were not developed using a prob- 
ability sample. They are based on data collected from teat users. 
The test manual states that these norms "should be applied with 
caution until local norms can be developed.** Although N's for some 
tests consist of more than a thousand students, others comprise less 
than a hundred students. For these reasons, the norms do not provide 
a convincing, precise standard of comparison. 



5. Out-of-level testing: Norms are provided for out-of-level testing; 
however, above comments regarding norms should be taken Into account. 



6. Procedures for minimlxing bias: Content was selected that is familiar 
to English and Spanish speakers of the Western Heizilsphere. A semantic 
frequency list was consulted In wording the traniilatlon, but the manual 
states that frequency Is not always an Indication of difficulty level. 
Spanish trial Items were administered to Spanish speakers, and English 
trial Items were administered to English speakers, after which item 
analysis and Item selection were performed. 



93 



94 



Inter*Amerlcan Series! Test of Genersl Ability, 1961-72 
Forma CB, DE, CEs, snd DEs 



1. Languages; English snd Spanish 

Spanish version Is an exact translation of English version. 



2. Publisher's recommended In-level use* 



Level 1 
Level 2 
Level 3 



Preschool Level 



Ages 4 snd 5 
Grades end K, Grade 1 
Grades 2, 3 
Grsdes 4, 5, 6 



3. 



Subtest components; 



Level 



Components 



Pre- 
school 



1 



2 



3 



Oral Vocabulary 

Number 

Assoclstlon 

Clssslflcatlon 

Analogies 

Sentence completion 
Computation 
Word Relations 
Number SarSes 



X 
X 



X 
X 



X 
X 
X 
X 
X 
X 



4. Normlngt The lnter*Amerlcsn norms were not developed using a prob- 
ability sample; the norms are based on dsta collected from test 
users. The test manusl ststes that these norms ''should be applied 
with caution until lor.al norms can be developed." Although N's 
for some tests consist of more thsn a thoussnd students^ others 
comprise less thsn a hundred students. ?or these reasons* the 
norms do not provide s convincing* precise standsrd of comparison. 

5. Out-of-level testing! Norms s^e provided for out*of-level testing; 
however* sbove comments regarding norms should be tsken Into 



6. Procedures for minimizing bias; Content was selected that ic famlllsr 
to English and Spanish speskers of the Westt^rn Hemisphere. A semsntlc 
frequency list was consulted in wording the translstlon* but the manual 
states that frequency Is not alw<^.ys an Indication of difficulty level. 
Spanish trial Items were administered to Spanish speskers* and English 
trial Items were administered to English speakers* after which Item 
ar^alysls snd Item selection were performed. 



account. 



94 




IOWA Tests of Basic Skills, 1978 
Forms 7 and 8 



1 . Languages : Engll sh 

2. Publisher's recommended In-level use: 

Level Grade Forms 



Primary Battery 5 K.1-1.5 7 

Primary Battery 6 K.8-1.9 7 

Primary Battery 7 1.7-2.6 7 

Primary Battery 8 2.7-3.5 7 

Multilevel Battery 9 3 7 and 8 

Multilevel Battery 10 4 7 and 8 

Multilevel Battery 11 5 7 and 8 

Multilevel Battery 12 6 7 and 8 



3* Subteat components; 



Level 



8 9 10 11 12 



Reading 

Reading Comprehension X X X X 

Pictures X X 

Sentences X X 

Stories X X 

Reading X 
Vocabulary XXXXXXXX 
Math 

Math Concepts X X 

Math Problems X X 

Math Computations X X X X X X 

Math X X 

Language 

Spelling X X X X X X 

Capitalization X X X X X X 

Punctuation X X X X X X 

Usage X X X X X X 

Language X X 

Listening X X X X 

4. Normlng; Empirical norms exist for 15 October and 15 April. 

5. Out*-o£-level testing; An expanded standard score scale Is pro- 
vided. 

6. Procedurea for minimizing blaa; Authors with dlverae cultural 
backgrounds participated In vrltlng of test. 



96 



Metropolitan Achievement Testa 
(HAT) 1978 Forms Jl and Kl 

1 * I^nguageat English 

2* Publisher's recommended in-level use; 

Level Primary 

Primer Kt5-lt4 

Primary 1 1*5*2*4 

Primary 2 2*5-3*4 

Elementary 3 1 5-4 1 9 

Intermediate 5*0"6*9 

3* Subtest componenta 



4* 



5* 



6* 



Reading Comprehen- 
sion^ 
Language 

Listening Compre- 
hension 
Punctuation and 
Capitalization 
Usage 

Grammar and Syntax 

Spelling 

Study Skills 
Math 

Numeration 

Geometry and 
Meaaurement 

Problem Solving 

Operations; Whole 
Numbers 

Operations; Laws 
and Properties 

Operations; Frac- 
tions & Decimals 

GraPha & Statistics 



primer 



Primary 
1 



Primary 
2 



Elemen* 
tarv 



Interme- 
difite 



X 

X 

X 
X 



X 
X 
X 
X 

X 
X 

X 
X 



X 
X 
X 
X 

X 
X 

X 
X 



X 
X 
X 
X 

X 
X 

X 

X 



X 
X 
X 
X 

X 
X 

X 
X 



*Additional reading subtests such as rate and auditory discrimi- 
nation are availsble^ but they are not part of the comprehension score* 

Normlngt Empirical fall and apring noms have been developed with mid- 
points of 15 October and 20 April respectively* 

Out**of-level testing; Provides an expanded standard score acsle* Out- 
of-level teating ahould be no more than one level below thst recommended 
for the grade. 

A combination of objective and subjective methods was used to identify 
ethnically biased items on the MAT* Following review by a panel of 
ethnically diverse educators^ teat items were examined for biss using 
three conceptually different statistical detection methods* Items 
tagged aa biased by either the aubjective or objective procedures 
were subsequently revised or eliminated* 



96 



97 



Sequential Tests of Educational Progress 
(STEP) III, 1979* Forms X and Y 



1 • Languages i English 



2. Publisher's recommended In level use: 



Level 



Intermediate E 
Intermediate F 
Intermediate C 



Grade 
3.5-4.5 
4.5-5.5 
5.5-6.5 



3. Subtest components: 







Level 






E 


F 


G 


Reading Total 








Vocabulary 


X 


X 


X 


Comprehension 


X 


X 


X 


Inference 


X 


X 


X 


Math 








Mathematics Basic Concepts 


X 


X 


X 


Mathematics Computations 


X 


X 


X 


language; (Writing Skills 








Spelling 


X 


X 


X 


Capitalization 


X 


X 


X 


Word Strurture and Usage 


X 


X 


X 


Sentence and Paragraph Organization 


X 


X 


X 


language : tls tenlng 








Listening Comprehension 


X 


X 


X 


Following Directions 


X 


X 


X 



4. forming; Empirical norms are available for fall and spring. 
Midpoints of the normlng periods are 5 October and 10 Hay. 



5. Cut-of-level testing; Provides expanded standard score scale 
and also out-of-le\^el norms. Has locator test. 



6. Procedures for minimizing bias; Items were edited by In-house 
minority and women test specialists^ and by an external minority 
review panel. 

. 

7. Additional comments; Can be used In conjunction with CIRCUS^ 
1978t because of the coordination of test content and an expanded 
standard score acale. 



97 



SRA Achievement Series^ 1976» Forms I and 2 



Languages; English 



Publisher's recommendsd In level use; 



Lftvel Primary 

B 1.5-2.5 

C 2.5-3*5 

D 3.5-4.5 

E 4.5-6.5 



Subtest components; 



Level 



fioBOonent 


A 


B 


c 


D 


E 


Reading 












Visual Discrimination 


X 










Auditory Dlscrlffllnatlon 


X 


X 








Letters /Sounds 


X 


X 


X 






Listening Comprehension 


X 


X 


X 






Vocabulary 




X 


X 


X 


X 


Compreheoa Ion 




X 


X 


X 


X 


Mathematics 












Concepts 


X 


X 


X 


X 


X 


Coiaputatlon 




X 


X 


X 


X 


Problem Solving 










X 


Language Arts 












Mechanics 






X 


X 


X 


Usage 






X 


X 


X 


Spelling 






X 


X 


X 



Normlng; The norms are based on a nationally r^resentatlve sample 
of students, fiaplrlcal spring norms are svallable with temporary 
fall Interpolated norms. Eo^lrlcal fall norms Are currently bslng 
developed. Empirical fall and spring normlng dates are; 7 October 
and 25 April. 

Out-of~level testing; Out-of-level testing can be Interpreted 
using the SRA expanded standard score scale known as GSV (Growth 
Scale Value). 

Procedures for minimizing bias; Items were edited by representatives 
of mlncrlty groups and women. The trial Items were administered to 
a sample that Included Blacky Hispanic^ American Indian and non- 
mlnorlty subsamples. Xhe Items were then examined statistically 
and Items which were easy for one groups but difficult for another 
were eliminated. 



99 98 



Stanford Achievement Test, 1973 
Forms A, and C 



1. Languages; English 

2. Publisher's recommended In level use; 



Level 
Primary I 
Primary II 
Primary III 
Intermediate I 
Intermediate II 

3. Subtest components 



Total Reading 
Reading Com- 
prehension 
Word Study Skills 

Total Mathematics 
Concep ts 
Computation and 

Applications 
Computation 
Applications 

Total Auditory 
Vocabulary 
Listening Com- 
prehension 



Primary 
1.5-2.4 
2.5-3.4 
3.5-4.4 
4.5-5.4 
5.5-6.9 



Primary Primary Primary Interme- Interme- 



II 



X 
X 

X 
X 



X 
X 



III 

X 
X 



dlate I dlate II 



X 
X 



X 
X 



X 
X 



X 
X 

X 
X 



X 
X 

X 
X 



X 
X 

X 
X 



X 
X 

X 
X 



4. Normlng; Empirical norms are available with a midpoint of 8 October 
for grades 2-9, and 8 May for grades 1-9, and 8 February for grades 

1 and 2. 

5. Ouu-of-level testing; Provides an expanded standard score scale. 
Testing more than one level out-of-levcl Is not recoimnended . 

6. Procedures for minimizing bias; Items were edited by a group of 
consultants with various minority backgrounds. 

7. Other comments; Scaled score is continuous with Stanford Early 
School Achievement (SESAT) and Stanford Test of Academic Skills 
(TASK) . 



99 



loo 



TEST OF BASIC EXPERIENCE II 
(TOBE) 1976 



1« Languages; Engliah and Spanish 

Thfi Spanish version la a direct translation from the English 
ulth the exception of Items that would radically change in 
translation « In such cases equivalent Items wers conatructed« 
Spanish version of the test occasionally provides a choice 
of vords so that the most common version of words can ba used 
with Mexican » Cuban, or ^Puerto Rlcan students* 



2« Publisher's recommended In^level uses 



Level 



Grade 



K 

L 



Preschool, kindergarten, fall of first grade 
Spring of kindergarten, first grade 



3« Subtests! 

Mathematics 
Language 

Science 

Social Studies 



Level 
K 



X 
X 
X 
X 



Level, 

L 



X 
X 
X 
X 



4« Hormlngt Empirical norms exlet only for the English version of the 
testt midpoints are October 19 and April 19* 

5* Out^of^level testing: Provldea expanded standard ecore scales* 

6« Procedures for minimizing bias; Test Items were reviewed by a panel 
of women and minority consultants* The Spanish version of the test 
was reviewed by native speakers of Puerto Rlcan, Cuban, and Mclean 
Spanish* 



100 

^01 



Selecting a Language Proficiency Test 



6"b 



In order to select a language proficiency test* program personnel 

may consult catalogues of tests which are available. S^me catalogues 

offer straight descriptions of Instruments^ while others offer an evalu- 

2 

atlve assessment of the tests. It may be difficult to make^a decision 
when confronted with so many choices of tests. In an effort to assist 
districts In this task* several states have convened panels of profes- 
sionals with expertise In language proficiency testing for the purpose 

3 

of examining and rating tests and making temporary recommendations. 
The reports or such meetings are helpful to districts since they^often 
explain the criteria upon which tests were selected and Indicate the 
ratings given to each test. 



Bye* T. T. Tests that measure language ability; A descriptive 
compilation . Berkeley* California; BABEL/LAU Center* 1977. 

Dissemination and Assessment Center for Blllngu&l Education. 
Evaluation Instruments for bilingual educations An annotated blbll* * 
opraphy . Austin* Texas; DACBE* 1976. 

Northwest Regional Education Laboratory* Center for Bilingual 
Educat Ion . Assessment Instruments In bilingual education; A de" 
scrlptlve catalogue of 342 oral and written tests . Los Angeles* 
California; National Dissemination and Assessment Center* 1978. 

2 

Northwest Regional Educational Laboratory. Oral language tests 
for bilingual students; An evaluation of language dominance and pro" 
flclency Instruments . Portland* Oregon; NWRL* 1976. 

Fletcher* B* P.* Locks* H* A.* Reynolds* D. F.* and Slsson, B. 
G. A guide to assessment Instruments for limited English speaklnp 
students. New York* New York; Santlllana Publishing Company* Inc. 
' 1978. 

3 

Law* A. Proceedings of the Bilingual Instrument Review Committee 
(AB 3470). Sacramento* California; Office of Program Evaluation and 
Research* California State Department of Education* September 28* 
1978* 

Texas Education Agency. Report from the committee for the 
evaluation of language assessment Instruments* 1977. 



101 

l02 



The most Important critical points to be taken Into account In 
selecting a language proficiency test depend on the uae to which the 
test will be put. Host districts use test results as the criterion (or 
one of the criteria) for classifying students as either limited In Sngllsh 
or proficient In English. The validity of the test for this purpose^ 
then^ Is of primary concern. 

The test should provide a cutoff score or range and informetion 
about validity studies to support the cutoff. Unfortunetely* at the 
time of printing^ very few teets have adequate validity date and cutoff 
levels vary from te&t to test. This means that the same child might 
be classified as "limited-English-speaking'* if Test A is administered 
and as "fluent-English-speaking'* if Test B is administered. Studies are 
now being conducted Ho compare and equate language proficiency tests 
and some helpful results ahould soon be available for making more 
informed decisione about using tests for program placement. Meanwhile 
caution should be exercised in relying on any single test for classic 
fylng students. 

Another consideration related to validity concerne the scoring sya- 
tern. A test that has versions in two or more languages should provide 
a proficiency score in each language. It is illogical and inapproprlatei 
however* to provide a proficiency rating in one language based only on 
proficiency in the other language. While a dominance clasaif Ication can 
be derived from proficiency scores « a proficiency score cannot be deter- 
mined on the basis of proficiency in the other language or on the hasls 
of dominance. 

It is difficult to get reliable, valid language proficiency scores 
for kindergarteners and first gradera, particulary on the more global 
measures* One way to improve the situation is to be sure test edmln** 
istratlon procedures are strictly standardized and that children are not 



See" for example, Gllmore, G.* & Dickerson, A. The relationship 
between Instruments used for identifying children of 7 'mited English 
speaking ability in Texas. Houston: Region IV Education Service Cen*^ 
ter,1979. 



102 



distracted. Children in this age range have a short attention span and 
may not be willing to sit still for 20 minutes. Thid problem might be 
overcome by administering a test in two parts. Since it is orly 
possible to test certain aspects of .language with any one test» and 
since valid reliable results are not assured* teacher Judgment should 
play a part in arriving at decisions concerning classification and 
program placement. 

If the test Is to be used as an achievement measure* as well as a 

classification measure* as is the case in many programs* then several 

5 

other issues should be considered. First* the test needs to have 
enough items so that growth can be detected. Second* when children 
are tested every fall and spring with the same test* they may m^orize 
parts of the test* particularly stories. Fall to fall testing would 
help* but then* whatever unknown amount of growth occurs during the 
summer cannot be attributed to the program. A third consideration con* 
cerns units for measuring growth. Some tests provide a score of one 
to five lavels. Setting goals and reporting growth in this way masks 
growth that occurs within levels. A test should provide raw scores 
as well ss levels* and growth should be reported in terms of raw scores. 
It may a lso be interesting to report changes in levels. 

Here is a list of additional points to take into account in select* 
ing and in interpreting the results from language proficiency tests: 

1* Instructions should be simple and totally ijnderstandable to 

the student, they shotild bs provided in the language the child 
knows best. 

2. Administration procedures should be clearly spelled out so 
that they can be standardized across administrations. 



Using the same test for selection and pre*^post outcome evaluation 
will introduce bias due to a regression toward the mean resulting in 
exaggerated gains (see Horst* Tallmsdge* and Wood* 1975). 




3« Elicitation tasks should not raquira unnatural language, tha 
reaponsas axpacted ahould be tho86 of an averaga native apeaker 
of the aama age speaking in normal conversational style« 

4« Items should not require tasks that are irt>ave the developmental 
level of the student « 

5« Items ahould not require metalinguistic awareness or linguistic 
manlpulationt since these may not he indicators of proficiency* 

6« Items should measure aspects of language and not other things 
such as aemoryt literacyt and willingness to tslk« 

7« The content of the test ahould he within the student's cultural 
experience* 

8* Proficiency should not he determined strictly on the basis of 
quantity of speech* 

9« A teat that is too long or too ahort may he unreliable* 



104 



Annotated List of Languajte Proficiency Tests 



This annotated list of language proficiency tests Is short and pro- 
vides project directors and evaluators with much of the Information nec- 
essary to make a well Informed choice. The criterion used for Including 
tests In the list Is the following: each test Is recommended (at the time 
of printing) by at least one of the three states having the largest number 
of bilingual education programs. The tests are primarily In Spanish and 
English and range from kindergarten level to high school. A brief descrip- 
tion Is offered of each test as well as comments on the linguistic and 
technical properties of the tests. The comments are points that evalua* 
tors and project directors should be well aware of In selecting a test or 
In Interpreting test results. The comments were drawn from several sources 
Including the ^perlence of districts In the bilingual PIP field test study » 
and published articles and critiques. Each publisher was given an opportun- 
ity to respond to the review and to Include ^'Publisher's Comments." This 
Information has been Incorporated Into the reviews. 



105 



Deacrlptlona of Commonly Ueed 
Language Proficiency Teats 



Baalc Inven t ory of Matural Language (BINL) 



Languagea 

What It Teatst 
Levela and Grades; 
Admlnlstratlont 



Scoring! 
Interpretstlon; 



Comments ; 



Fubllaher'tt Comment; 



Qigllsh and Spanish (can be uaed for othsr 
languagea ) 



Speaking 
K--12 

Individually admlnlatered* 
minutes* 



Requlrea 10*15 



Pictures are uaed to elicit natural speech «nd 
ten sentencea are tape recorded for later 
analyals* 

Band or machine scored* 

Yields raw acorea that can be converted to one 

of four levels* NBS, LBS, FES, PES ("proficient")* 

Age la taken Into account In determining levels* 

Pictures sre large, attractive, vith multlcul* 
tural content* It la difficult to atandarrdlze 
a4mlnlat ration procedures since there Is no set 
of "Items" but rather an ellcltatlon technique* 
Complex to score by hand* Scored on the basis of 
linguistic complexity snd length of sentences* 
These crlterls may not always t>e valid Indlca* 
tore of proficiency* 

No Information Is provided on the validity of the 
proficiency cat^orles* Information on validity 
la limited to correlstlons of sentence length with 
complexity, snd correlstlons of complexity scores 
with an orsl resdlqg test* Reliability dsts Is 
limited to correlstlons between the first hslf snd 
the second hslf of the test* These correlstlons 
were high* Some districts hAVe found thst the 
test clssslfles fluent speskers sb "limited'* (see 
Gllmore and Dlckerson, 1979)* 

Stsndsrdlzstlon Is fscllltsted by adequste trsln- 
Ipg snd close sdherence to BINL procedures* 
Machine scoring procedures* reports of five 
different types, from clsssroom listings to 
district Bummsrles, Including pre^post aversges, 
minimum, maximum and average scores by grade 
levels* A recent atudy establishes averages for 
grsdes K*12 bssed on a sample of 125,000 students* 
Stsndsrd arror allows for ve Id sdjustment of 
scores* The formst of the ^st permits retest on 
Invslld tests which havi; be in reported to be lesa 
than AX of tests submitted for mschlne scoring* 
Percentile rsnk of scores is now Included In 
reports* 



Descriptions of Commonly Used 



Languaye Proficiency Tests 



Bilingual Syntax Measure (BSK) 

Languagea; English and Spanish 

What It Tests: Speaking 

Levels and Grades: Level I* (ages 4 to 9) 

Level XX (not available for review) 

Administration; Individually administered. 

Requires 10-15 minutes. 

Students respond orally to questlona based on 
pictures. 

Scoring: Hand scored « 

Interpretation; Provides language dominance (when both English 

and Spanish tests are admlnlatered)* level of 
second language acquisition* and degree of 
maintenance or loss of the first language. 
Assigns students to one of five proficiency 
levels In each language. Additionally * provides 
Instructional suggestions for reading and ESL 
vhlch correspond to each of the five English 
proficiency levels. 



Comments; Attractive* colorful pictures are used to elicit 

speech through structured converaatlon. Re* 
sponses are scored strictly on the correctness 
of specific grammatical structures. The choice 
of grammatical structures Is based on research 
atudlea on the sequence of acquisition of mor- 
phemes. Allows for regional language variation. 
A number of dlscusalona of this test have been 
published including Hern&ndez*Ch. * 1978 And 
Roaanaky* 1979. 

Both teat^retest reliability and Ititer-scorer 
reliability are reported in the Technical Hand* 
book. Although the reported reliability la low* 
the authora attempt to explain why this is so 
(TH» p. 45). 



Hern&ndez*Ch . » Eduardo. Critique of a critique; laaues in Ian* 
guage assesament. Journal of the Katlonal Association for Bilingual Edu" 
cation* March 1978» Vol. II» Ko. 2. 

2 

Ro^anakyt E. J. A review of the Bilingual Syntax Measure. In B. 
Spolskyt Advances in language testlnft i Arlington* VA; Center for Applied 
Linguistics* 1979. 



107 



Descriptions of Coonaon3,v Uaed 



liflnguflge Proficiency Testfl 



Comprehensive English liflngua&e Teat for Speakers of English afl a Second 



Langua&e iCELtl 
Language; 
Iftiat It Teatst 



English 

Listening comprehension^ granmart and vocabulary* 
Contalna three subtests; (1) Listening* (2) 
Structure* and (3) Vocsbulary 



Levela and Grades; Rlgh school » collega* and adult* 

Dealgned for Intermediate to advanced ESL 
students* 



Administration; 



Scoring ; 
Interpretation; 



Comments} 



Group administered* 

Listening requires 40 minutes; Structure re*- 
quires 45 minutes; Vocabulary requires 35 
minutes* A recording can be uaed to admin- 
ister the listening test* 

All test Items are multiple choice* Students 
reapond to oral and written stimuli by marking 
an answer sheet* 

Scored with a key* 

Yields percent correct for each teat* 

Percentile scores are available (but see Comments)* 

Does not provide proficiency clssslflcatlons* No 
cutoff score Is provided for classification of 
students as limited In English proficiency* since 
test was not designed for this purpose* 

Oral production Is not tested* 

All test Items on each aubtest sre multiple 
choice Items that require reading; therefore^ 
the measures of listening comprehension* struc* 
ture* and vocabulary are each confounded with 
literacy skills* The authors recommend the 
Vocabulary subtest for use with students y/ho 
have had advanced training In reading* 



The three subtests had moderate to high internal 
consistencies with four groups of foreign stu** 
dents and» therefore* very reasonable standard 



i09 



m 



Comprehensive En&llsh LanRuage Test for Speakers of Enallsh as a Second 



Language (CELT) (continued) 



Comments! errors of measurement. No Information Is given 

(continued) on predictive validity* Tentative evidence of 

concurrent validity Is offered based on correla- 
tions uith other standard ESL tests. Tentative 
norms for five different groups, based on .:i!>all 
samples, are provided. The norms are not api i^o** 
prlate for use In most bilingual programs, how* 
ever, since the students In the normlng sample 
are not similar to most students In bilingual 
programs. 



109 



Deacvlptlcaa of Commonly Oaed 
Language Proficiency Tfedta 



Ilyln Oral Interview Teat 



Languagea; 
%at It Teats $ 
Levela and Gradeat 
Fortoa; 

Admlnlatratlon: 



Scoring t 
Interpretation: 



Comments t 



Engllah 
Speaking 

Secondary and adult* 

There are two forms (BILL and TOH) and each 
haa a long version (50 Itema) and a ahort 
veralon (30 Items). 

Individually administered. Requires up to 30 
mlnutea . 

The studenta teapond to pictorial stliaull and 
<{ueatlona by reapondlng orally* Itema are 
ordered In difficulty and Interview la terml** 
nated when a fruatratlon level Is reached* 

Hand scored* 

Yields raw scores « No cutoff score Is given to 
Identify students as ^^llmlted" In English pro- 
flclency; however^ suggestions are given for 
placement levels In adult ESL programst and a 
range Is suggested as the degree of proficiency 
required for Jobs In which orsl communication 
with the public Is limited* 

The requirement tc answer In a complete sentence 
Is an unnatursl one and may depress scores of 
students who fsll to do thls« The long version 
can become monotonous since many pictures sre 
repeated « 

Internsl consistency reliabilities sre high. 
No lnfo3:mstion is given for test«*retest rellabll** 
Ity or Interrster rellsblllty. Vslldlty informs^ 
tlon l8 limited to correlations with other tests» 
snd based on very smsll samples. 




110 



Descriptions of Connnonly Used 



Language Proficiency Tests 



Language Assessment Battery (LAB) 



Languages: 
What It Tests; 



Levels and Grades: 



English and Spanish 

Listening, speaking, reading, and writing. 

Level I has three subtests: (L) Listening and 
Speaking, (2) Reading, and (3) Writing. Levels 
II and III have four subtests; (l) Listening, 
(2) Reading, (3) Writing, and (4) Speaking. 

Level I, grades K-^2; Level II, grades 3-^6; Level 
III, grades 7-12. 



Administration: 



Scoring: 
Interpretation: 



Conunents; 



Level It Individually administered, requires 
5-^10 minutes. 

Levels II and III; Part Is individually adminis- 
tered; requires 41 minutes. 

Students respond to verbal, written, and pictorial 
stimuli by pointing, by giving oral responses, by 
writing, and by marking answer sheets (on Levels 
II and III only) . 

Hand scored; parts scored with a key. 

Yields raw scores and stanlnes and percentiles 
by grade* Students scoring below the 20th per** 
centlle may be classified as limited In English 
proficiency. 

The speaking section of Level I, Test I, 'contains 
only 6 Items, all of which may be answered with 
one word. The writing tests measure reading 
skills In addition to writing skills. 

The test went through all the stages of prepa** 
ration by expert and experienced Item writers, 
pilot studies. Item- and test-^analyses , and 
normlng on substantial samples (20 schools, 
and about 500 students at each level from K 
through 12). The technical manual is a model. 

One study^ has shown that the Level I Engll^ 
test does not discriminate well In the range near 
the cutoff point for classifying students as 
limited In English. This reduces its value for 
use as a pre-^post measure. 



Hubert, J. An Investigation of the Language Assessment Battery 
(English, Level 1) for Title VII students In Hartford. Unpublished 
manuscript , 1978. 

Ill 



Deacrlptlonfl of Coromonly Uaed 
L*nfluage Proficiency Teata 



LanRUAge Aflaeaflmetit Sc*lfes (LAS) 



tanguagfttt 
What It Teata; 



Levela and Grade.it 
Adinlnla t ra t Ion ; 



Scoring; 



Interpretation; 



CoiDnents; 



English and Spanlah 

tlatanlng comprehenalon and apaaklng* Five 
subtesta form the total scora for both lavela; 

(1) dlacrimlnatlott of niclmal phonemic palra* 

(2) Vocabulary production^ (3) phonema produc*^ 
tlon» (4) ayntax comprehension^ and (5) story 
production* 

Level X» gradea K*5* 
Level XX» gradea 6*12« 

Individually admjnlatered. 

Requlrea 20 mlnutea* 

Stimuli consist of tape recorded speech and pic* 
turea* Students reapond orally^ -and by pointing* 

Hand acored* 

Interrater reliability should je obtained on 
storytelling task. 

Age Is taken Into ^account In acorlng. 

Yields a score of 1 to 100 which can be con-* 
verted to a levels 1 to 5* 

Studenta who acore at level 3 or below are 
claaalfled aa **tlmlted English (or Spanish) 
8peakera«^' 

Ihls is a fairly comprehensive overall aural*^ 
oral proficiency test* There are problems with 
the phonemic dlacrlmlnstion section alnce this 
taak requlrea a kind of metalinguistic awareneaa 
students may not have* The story retelling task 
messures not only production « but also compre^ 
hens Ion. 

Interrater reliability coefficients for the 
story retelling task are moderately high* 
Coefficients of Internal Item consistency for 
discrete-point Items rsnge from *36 to *96* 



112 

113 



Languaite Asseagment Scales (LAS) (continued) 



Comments: Validation consisted of one-way analyses of 

(continued) variance of relatively small samples (one» to 

two hundred) of students dichotomized Into 
&igllsh-domlnant and Spanlsh^domlnant on the 
basis of teacher Judgment* 

Several studies of reliability were done on small 
samples (21 English and 35 Spanish) using various 
approaches* The sample sizes were too small to 
Justify some of the analyses and the conclusions 
drawn from them« 



DeecrlfitlonB of Commpnlv Used 
Language Proficiency Teete 



Primary AcQuleltlon of LaoauaRe (PAL) Oral LanRuaRe Dominance Meajure (OUM) 
Oral T,ai.2ua&e Proficiency Measure (OLPM) 

Languageajf Qigllsh and Spanish 

What It Teats t Listening coup reliens ion and spaaking 

Levels and Grades: PAL QLDH» K»3 

OLPM> 4*6 

Admlniatratlon: Individually administered* 

Requires 15 minutas for ^cb language* 

Students respond orally to oral and pictorial 
stimuli* 

Scorings Band scored* 

Interpretatlont Yields raw scores (*'G scores*') that are converted 

to proficiency levels^ 1 to 5* Also yields doml*- 
nance categories* 

Students who score at level 4 or below are 
classified as "Lliiited English (or Spanish) 
speakers*'* 

Commentst Simple to use and score* Scored on the basis of 

grammaticallty and appropr lateness of responses 
as well as quantity of speech. 

Ihe test was developed "as a result of research by 
the El Paso Public Schools*'* 

Item analyses were used in the construction of the 
tests although samples were somevhat small (about 
200 drawn from three grades In high schools)* 
Validity is quoted in terms of the tests ability 
to grade schools in correct order r and of correla- 
tions with a reading test* Ihe latter were fair 
being around 0*3 to 0*5. 




Inscriptions of Commonly Used 



Language Proficiency Teats 



Shutt Primary Language Indicator Test (SPLIT) 
Languages; English and Spanish 

Nhat It Tests; Listening comprehension « speaking, reading, and 

graomar • 

There are three subtests; (1) Listening Compre- 
hension, (2) Verbal Fluency, and (3) Reading 
Comprehension and Grammar* 

Levels and Grades; Listening Comprehension^ Verbal Fluency, K-6; 

Reading Comprehension and Grf^mmar, 3-6« 

Administration; Listening Comprehension; Group administered; 

requires 35 minutes, tape recording available* 

Verbal Fluency; Individually administered; re- 
quires 15 minutes* 

Reading Comprehension and Grammar; Group 
administered; requires 30 minutes* 

Instructions are provided In both languages and 
are available on tape* Stimuli are oral, pic- 
torial, or written* Students respond orally, by 
marking pictures in answer book* or by marking 
an answer sheet • 

Scoring; Hand scored; parts scored with a key* 

Interpretation; Yields raw scores, percentile ranks, and age and 

grade equivalents. 

Yields a dominance classification* 

Comments; Yields no cutoff point to classify students as 

limited In English proficiency (Independent of 
Spanish/Portuguese score)* A proficiency clas- 
sification Is given based on the dominance clas- 
sification* This wrongly assumes that students 
are highly proficient In the dominant language* A 
student whose English score is Very low can be 
classified as "English Adequate" if the student's 
Spanish score is also very low, but higher than 
the English score. Districts should establish 
their own cutoff points for classifying students 
r in English* 

Grade equivalent scores should not be used. 



115 



A System for Comparing Curriculum Content with the Content 



of CTBS Spanish and English, Form B and C 



In order to measure program effects » the selection of a test that 
measures vhat Is being taught Is very Important. Several systems have been 
developed to systematically compare curriculum content and test content.^ 
Presumably, the evaluator will compare the program curriculum to several 
adequate and available tests and select the test that most closely matches 
the curriculum. The evaluator of a bilingual progrsm has very few choices. 
At the time of this writing, the CTBS is the only widely used standardized 
achievement test battery that is available in both Spanish and English. 
Because this test is so widely used, a system of comparing its content 
to that of any curriculum would have wide appllcatlou. 

Uses of the System 

Test selection . As stated above, very few major comprehensive tests 
exist for the evaluation of bilingual programs. However, there are many 
locally developed tests that have been distributed and other tests that 
are fairly limited in scope. Also there are districts that chose to devel- 
op and use criterion referenced tests. Additional tests will undoubtedly 
be developed. There is the option with a test like the CTBS of using only 
the subtests that are appropriate or of testing out^of^level. Therefore, 
careful comparison of test content with curriculum content can be used to 
discard the CTBS if it is totally inappropriate or to select the best com- 
bination of subtests and/ or the most iqyproprlate levels. 

Test interpretation . An evaluator may select the CTBS knowing that 
it does not match the curriculum as well as is desirable. A careful analy^ 
sis of the test and the curriculum can still be a valuable tool for data 
analysis. The test items that match curriculum content can be analyzed 



Morris, L. L. and Fitz-Glbbon, C. T. "Determining How Well a Test 
Fits the Program** in How to Measure Achievement . Beverly Hills, California: 
Sage Publlcatlonst 1978. 



117 

ii? 



separately from those which do not* If the gain for the matching items is 
greater than for the noQ-nkfttching iteme* then a case can be made for pro* 
gram intact versus simple maturation* 

Curriculum Planning . Another use of such a comparison is to make 
changes in the curriculum* This is not to suggest that '*teach to the 
teat'^ becomes the rule* because a test will always sample only a small 
amount of what is actually tsughtf However* curricula are always under 
revision and criteria by which success of instruction are to be evaluated 
have some claim to consideration* 

Limitations 

This instrument has been developed only for the first two levels of 
the CTBSp levels B and C, commonly used in first and second grade* How- 
evert the steps outlined in this fo3:m could be used as a model for examin- 
ing higher levels of the test* 

Directions for Use 

The attached forms are divided into three parts per grade level; 
Spanish Reading* English Reading* and Hath* 

Each part consists of two sectionsi the Test/Curriculum Analysis* 
which is to be completed by each teacher; and the Summary* which is to 
be completed by the evsluator or other staff person* Where there are 
several teachers per grade level* the summary should represent an average* 
However* in cases where the instructional treatments varied so much that 
the test reaults will be reported separately* a summary should be made 
for esch different treatment. 

Time foi' Task 

Estimated working time is one hour per teacher to complete the 
analysis and several hours for the evaluator to ^plain the task to 
teachers* distribute and collect forms* and develop summaries* 



118 

lis 



CTBS English Reading Level B (Grade 1) 



TEST/CURRICULUM ANALYSIS 
(to be completed by project teachers) 



Reading; Vocabulary from Teats K 2* and 3 

1. As a result of the English language arts atrrlculum and other school 
and non-school experiences, vhat words on the word list are students 
likely to have seen* heard, read or used? Review the words on the 
word list and circle each word that the students have not been exposed - 
to. 

CAtmOM ; A child knows inany more words than are taught in school. 

* Vocabulary Is learned from inany sources. Therefore* do 
not limit your consideration of students' vocabulary to 
iriiat Is covered In the curriculum. 



119 



CTBS English 



Vocabulary 



Level E 



(all words from Teflts 1-3) 



a 


dollar 


let 


sister 


after 


done 


like 


slatera 


and 


door 


little 


sleep 


animal 


down 


look 


soma 


appls 


dreaa 


made 


street 


are 


drink 


make 


surprise 


around 


eggs 


man 


Susan 


at 


enamel 


many 
Mary 


table 


balloon 


father 


take 


beak 


finger 


mender 


tell 


bed 


fish 


mister 


the 


big 


flower 


misters 


these 


BUI 


fly 


money 


they 


Billy * 


foot 


mother 


thia 


birthday 


for 


Mrs. 


to 


bitten 


Frank 


near 


took 


black 


frog 


night 


toy 


Bob 


get 


not 


train 


book 


girl 


on 


tree 


boom 


girls 


one 


truck 


box 


green 


open 


bWO 


boy 


hand 


out 


wagon 




happy 


paint 


UA A 

was 


brown 


has 


party 
people 


will 


brownie 


have 


window 


bug 


he 


pet 


with 


bunt 


head 


pig 


woman 


button 


help 


plate 


won 


by 


her 


prince 




cake 


here 


puppy 




came 


him 


rabbit 




can 


himself 


rain 




cans 


hope 


read 




car 


hot 


ready 




changed 


I 


rope 




children 


In 


safe 




chimney 


Into 


said 




Chrlstmaa 


Is 


sat 




city 


It 


school 




clamp 


Jerk 


schools 




climb 


Joan 


seal 




clip 


Jump 


see 




clock 


kitchen 


sees 




clown 


kitten 


she 




coat 


know 


sheriff 




dab 


lean 


show 





I2d 



20 



Test 1 ; Word Recognition 1 



number of Items ; 19 

Task ; The student listens to a word read aloud and selects the correct 
printed word from four choices* Dlstractors consist of words that look 
similar to the right answer* Some are nonsense words or misspellings* 

2* Have the students had practice reading English words up to 
three syllables long? 

dally or weekly 

only once or twice 

none 



3* Have the students had practice reading all the letters and letter 
combinations that appear in Test 1? yes no 

If no» list letters or combinations that are not Included in 
the curriculum; 



Test 2 ; Reading Comprehension 
yum'jer of items ; 24 

Task t The student reads a sentence and selects an appropriate picture 
from three choices* Dlstractors consist of pictures with error in 
gender^ error in number or error in content* About half of the items 
consist of two sentences; the other half consists of one sentence only* 
Sentences range from 3 to 10 words in lengthy with the average sentence 
having five words* 

4* Have students had practice reading sentences in English? 

dally or weekly 
only once or twice 
none 



Test 3 ; Word Recognition II 
Mumber of items ; 19 

Task ; The student chooses one of four printed words that best matches 
a picture* Twelve of the 19 words are identical to Test 1 Word Recogni- 
tion It but the tasks are different because in Test I students respond 
to an aural clue and in Test 3 to a visual clue* 

Ko specific questions* 



121 



CICS English Reading Levttl B (Grade 1) 
(to be completed by project evaluator) 

SUMMARY 

Nuflfeera In pareotheaea refer to queatlon nunbera on preceding pagea« 



1* What percent of the reading test vocabulary are atudenta likely to 
have aeen» heard» read or uaed? (1) 

[The vocabulary Hat contains 167 different vordaj 

Commentat 



2« Have atudenta bet^.^i taught language arta akllla tested? (2» Zf 4) 

Yea 

No ^ 

CoDtmentat 



3« What major akllls In the Engllah language arta curriculum are not 
repreaented on the teat? 



4« What percentage of the curriculum doea this represent? 



122 

122 



CTBS EspaSol Reading Level B (Grade 1) 

TESr/CURRICULUM ANALYSIS 
(to be completed by project teachers) 



Reading; Test Vocabulary 

1« As a result of the Spanish language arts curriculum and other school 
and non-school experiences » vhat words on the word list are students 
likely to have seen* heard* read or used? Review the words on the 
word list and circle each word that the students have not been exposed 
to« 

CAUTION ; A child knows many more words than are taught in school* 
Vocabulary is learned from many sources* Therefore* do 
not limit your consideration students' vocabulary to 
what is covered in the curriculum* 



123 



1^3 



CTBS Espaaol Voc a bulary " Level C (all vords from Testa l'"3> 



a 


dentro 


Insecto 


F^rez 


VS 


abajo 


dlnero 


jota 


perro 


ve 


abler taa 


dlafraz6 


Juan 


personaa 


vendldo 


abrlr 


d61ar 


juguete 


plntar 


venldo 


ague 


dolor 


la 


plato 


ventana 


al 


dofl 


las 


pobre 


verdad 


aniflial 


duro 


latas 


prlnclpe 


volvl6 


anifliat: 


el 


leer 


pronto 


V 


aqul 


ella 


lea 


puede 


yo 


irbol 


en 


llbro 


pueden 




aflomar 


enclma 


limpla 


puedo 




abrll 


enortoe 


lo6 


puerta 




ayudar 


es 


luego 


rama 




bajar 


eata 


Lupe 


rana 




bajarae 


eat An 


lleva 


rata 




beber 


eataa 


llora 


rat6n 




bianco 


eate 


lluvla 


reloj 




boclna 


falda 


mani 


rey 




cabemoa 


fleet a 


mano 


riao 




cabeza 


flor 


mantel 


rope 




caja 


frota 


mapa 


sab Ian 




cttlle 
cam 


fruta 
fue 


Maria 

me^ 


sallr 
aaltar 




caml6n 


fuego 


mlnero 


se 




cancl6n 


fuente 


ffilra 


settor 




Carlos 


gato 


mono 


seSora 




celoa 


gente 


mosca 


aentado 




cerdo 


globo 


muchacho 


sentldo 




eluded 


gota 


muchachoa 


slllas 




clave 


grande 


toujer 


s6 lament e 




coclna 


guan te 


nacl6n 


solo 




cochlno 


gust a 


negro 


aombrero 




CO Una 


hsy 


nleve 


eon 


• 


color 


hecho 


nl8a 


aorpresa 




cone 


hechoa 


nlilas 


su 




comlda 


hermana 


nlilo 


sueilo 






hermanaa 


noohe 


Suaana 




r one 1 o 


henaano 


nuevo 


tanta 




conoce 


hermanob 


papi 


tlene 




cuando 


hen&oao 


para 


tomo 




cuento 


hermoaos 


paaeo 


trago 




cuerda 


hlja 


pastel 


traje 




culdado 


hljoa 


pastor 


tren 




cumpleaSoa 


hlzo 


payaao 


un 




de 


hoj^s 


peor 


una 




dedo 


huerta 


Fepe 


unaa 




dej6 


huevoa 


pequeSa 


uated 





124 

^^4 



Tfes t It Reconocimlento de Palabras 1 (Ubrd Recognition 1) 



number of Items t 19 

Task : The student listens to a word read aloud and selects the correct 
printed word from four choices* Dlstractors consist of words that look 
similar to the right snswer* They might begin or end with the same 
sound* for Instance* 

2. Have the students had practice In Spanish reading words up to 
three syllables long? 

dally or weekly 

only once or twice 

none 



Have the students had practice reading all the letters or com**^ 
blnatlons that appear In Test 1? yes no 

If iu>» list letters or combinations that are not Included In the 
currlculumf 



Test 2 t Comprehensl6n de Lectura (Reading Comprehension) 
number of Itema t 24 

Task: The student must read a sentence and select an appropriate pic- 
ture from three choices. Dlstrsctors conslat of pictures with error In 
gender » error In number or error In content* About half of the Items 
consist of two sentences* The other half consists of one sentence only* 
Sentences range from 3 to 12 words In lengthy with the average sentence 
having five words. 

4. Have students had practice reading sentences In Spanish? 

dally or weekly 

only once or twice 

none 



Test 3 t Reconocimlento de Palabras II (Word Recognition II) 
number of It ems t 19 

Task: The student chooses one of four printed words that best matches 
a picture. Twelve of the 19 words are Identical to Test 1 Word' Recognl*' 
tlon I| but the tasks are different because In Test I students j^espond 
to an aural clue and In Test 3 to a visual clue* 

no specific questions^ 



125 



CTBS EspaSol Reading Level B (Grade 1) 
(to be completed by project evaluator) 

SUHMARY 

Numbers in parentheses refer to question nunbers on preceding pages. 



1- What percent of the reading test vocabulary are students likely to 
have seen» heard^ reed or used? (1) 

% IThe vocabulary list contains 197 words. J 

Comments: _ 



2. Students have been taught language arts skills tested. (2^ 3» 4) 

Yes 

Mo 

Comaents: 



3. What major skills in the Spanish language arts curriculum are not 
represented by the test? 



4- What percentage of the curriculum does this represent? 



126 



CTBS Spanish or English Math Level B (Grade 1) 



TEST/CURRICULUM ANALYSIS 
(to be completed by project teachers) 



Math Battery 

1. What percent of math curriculum Is devoted to computations? 

2. What percent of math curriculum Is devoted to math concepts^ 
appllcatlont:^ and story problems? 

3* Do students have adequate vocabulary In tha language In which 
they are tested so they understand all directions and word 
problems? yes no 



Test 4 ; Conceptos y Apllcaclones de Hatemiitlcas/Hathematlcs Concepts 
and Applications 

Hunber of Items ; 24 

Task ; The student listens to a problem or a question read aloud and 
selects from four possible answers. 



4. Following Is a list of the skills Included In this test» with 
the number of Items devoted to each skill listed In parenthesis. 
Check In the space provided whether each skill Is covered In 
the curriculum and decide how many total Items this represents. 

Yes No 

value of numbers (2) 

addition and subtraction (4) 

numeration (3) 

equating a set to a number (1) 

equating a set to a number word (1) 

counting by twos (1) 

sets (1) 

subtraction story problem (2) 

missing addend (2) 

setting up story problems for addition (1) 

telling time (2) 

measurement (1) 

value of money (3) , , . , 

of 24 Items represent skills covered In the curriculum. 

(Caution: Do not simply add checks. For each Item checked add 
the number In the parenthesis at the end of that line.) 



127 



127 



Test 5 t Computacl6n de Matem&tlcaB/HathematlCB Cootputatlon 



Mmnhcr of Items ; 32 

Taskt Tha student computes written addition problems and chooses the 
correct answer from a group of three« A page of subtractlonsi also with 
three possible answer choices* follows « Tha time allotted to tnls subtest 
averages one minute per computation* 

5« What percent of the computations In math curriculum are repre- 
sented byt 

addition 

subtraction 



total loo; 

6« What percent of the additions performed In the math curriculum 
are represented byt 

horizontal addition 
vertical addition 



total 100% 



one digit addition _ 

two digit addition 

total lOOX 

What percent of the subtractions performed In the math curriculum 
are represented byt 

horizontal subtraction 
vertical subtraction 



total 100% 

one digit aubtractlon _____ 

two digit subtraction ' 

total 100% 



subtractlona requiring 
borrowing 



128 

128 



CTBS Spanish or English ^th Level B (Grade 1) 

SUMMARY 

(to be completed by project evaluator) 
Numbers In parentheses refer to question numbers on preceding pages. 



1. Compare curriculum to teat. (1* 2) 



Percent of percent of 
Curriculum Test 



Computations 57 

Hath concepts* application'** 

story problatts 43 

Match is appropriate? yes no 



Comments t 



2. Students have an adequate vocabulary for the math test? (3) 

Yes 

No 

Comments; 



3. In the math concept test, out of 24 or X of the test 

represents items that students have practiced In the curriculum. (4) 



Comments; 



129 



4. Compare curriculum to test. (5^ 6, 7) 

Percent of Percent of 



Curriculum Test 

addition SQ 

subt rec t Ion 50 

horizontal addition 31 

vertical addition 69 

one digit addition 31 

two digit addition 69 

horizontal subtraction 31 

vertical subtraction 69 

one digit subtraction 6 

two digit subtraction 94 

subtraction with borrowing ____ 6 



Hath computation skills are represented In the curriculum In similar 
proportion to their appearance on the test? 

Yes ^ 

Ko 

Comments* 



5* What skills from the math curriculum are not represented by the test? 



6. What percentage of the curriculum does this represent? 



130 

130 



CTBS Englleh Reading Level C (Grade 2) 



TEST/CURRICULUM ANALYSIS 
(to be coiupleted by project teachers) 



Reading; Teat Vocabulary 

1* As a result of the Englleh language arte curriculum and other echool 
and non-school experlencee^ vhat t/orde on the word llet are etudentB 
likely to have seen* heard* read or used? Review the words on the 
word list and circle each word that the students have not been exposed 
to* 

CAUTION 1 A child knows many more words than are taught In school* 

Vocabulary Is learned from many sources* Therefore* do not 
limit your consideration to what Is covered In the currlcu* 
lum* 



131 

^31 



ft D 5 r* H" fi» 




^ Q» Q» (to ib 

s 



fl» •> fl» f» 

n 09 H» H» r> 

tt Q» ft i-l ft w 



3 0 



oocui-hb'<cp&H%<to<^£etfooiiB^oa»rri^Cftft^it-if-ipapaC^ 

H*i-t OOOO H*l*<Cr PH'ft77t^a>0*ftDfi»'0'^H'(Ar> •» P&. (D £K<A^< * 

n ftn <55%< nft c rt ^ o*pft 



(D (D (b (to (P 

P . P 



A; 



OQ 09 09 09 l-hl-hl-hl-hl-hl-hl-hl-h*Tlrc>l-hl-hl-hl-hl-hl-^ ft (D 

ft(CCtoCtoCCC(-t(-|(-|(-|(-li-lOOOOOOOOH*l-h(|>(l>ft(l>CtoCtoCtoCto»M<P 

ft^BMUPH*^ O H*H*l;;^t^CSdl!S'~* O O £9* l-tDB* aa»^ & DDtD'-J- 
(Pl-> H* PO(t(l>D.i-|pB00(t ctMP^D. H^ftfttTB M%(|>(t(N|0 

o ftiCppttocucDV H. artD 

*o ^cuo-^ ftft a ^fto 



n ft tD (D A (D 

» M < P p o5 c» 



cl cl cl cl cl cl cl 

00 tt M B 

or ft t& 



ecu 



? f sr gr gr gr gr gr 8r gr gr s g s 

H5C P ftftM*Od H 

ftooni-ia> cto (i>H*(i> H* 

i-l (D H* (D O » 



»(» a* n n t-^ 
* or (D (D 



qr.pr.*3r.pr.^^^^*3r.^*3ror9roroooo 
Am(l>P»vvvc»tto(toAVp»i-ii-i 

(»P <<ftft{toM*UppH.&.0<S 
<CU H*(l>0(l> CU'US.H*!^ C< 



00 OQ 00 
000 

"IS 



O H* 

3 

(D 



(D H* 



sr 



00 O OH*H-(to(l>» (to»0 O OH»H»*^H»!i±*!±» »» (totitfcH.fIfa g OH-H-fi»rTnpP Ph*d 

SOcxDtaHp 'up^<:eg^<ftngF'»o*ftftfti-hMOocuQft^DftSci0H'r> o^cu m 



P (D 



01 1^ 



H* (to 



3 



P* M 
ft » 



ft ft (n 



P 



S8 



9«»v<v<rrn^i-^aarr*orrrrprAaaa*^i-l9« h»d*< I-* ft a.Hft rrn* ^rr 

H»ft Curt (pa.ft ntr© h 

P ft ft ft ft 
(P i-t 



00 00 OOOOOOOOOOVOO OOOOOOVOOQ>VnVVOOOOOOOOV VOO 00 V 00 1^1^ (^(^(^(^(^(^(^(^(^(^*0*0*0*0 

i-^i-^^H*H*:r:r:rp*:r:r:r:rft ftftftft ft n«»ttO»o»&«» «» co oo h>*h>*i-^H>ftft«»fi» C(-t oo 

ft ft H»»l-'0 O O O o H»ft ft D h-^ftft ifrtta'^ CP d B h^h^H»OUOUp Q o o <P ouopWcrPrrft Op 
ofti3£T<<;<CCftpi-^ rri-^i rr o Buft i-'h^o.o. g^g?r?rft«ft:rrrcg* rrn*< 



rr *0 *0 
ft 



ft rr c> 



o 



p 



oa ft i-i 



ft 

fit f-i 



La 



O 



n^rrrrrrrrrrrrrrrrrrrrrrrro* 
o H»:j*:j*:j*g*g*:j*:j*:j*:j*ft » c 
ai-«i-*»i-*»ftftftft" 
ft ft tn p P a ^ 



«» (Kl o» 
rr p n 



OB 

^ ■ c 

ft 'O •o 
H ft 

ft 



VOOOOOOQ>VVOOOO 

C rrrrrrrrrrrrrr*0 

pe*-ti-too«»«»o 

nOfti-«p*<i-«0 

!^PftftftftOOp 

oo rr 00 a* 



Q> 00 V no 
*o *o *t) o 
ft » C 
H o tJ 
I-* ft 
B 
ft 
P 



o o 

O P 

p 



Q>Q>VOOCAOOQ>Q>Q>00 

g 1 S. ° I I I K 

o* :j* ft rr H* I-* ft 

O ft 



Co 
Co 



i-|i-lrro>«i-^i-^0oa*^OH*H*ftftft«»i-lPft rr{n«PMH*H*i}oo»i-i 
"^:j*:j*fti-^a. ft rri-^HPftrrftrr^r ft^ rr?rrrrroH*vi 

g ft ft ft 



?r ft 



ft 



ft 



rr ?r rr 
P 



o 
P rr 



C CCrrrrrrrrrrrrrrrr 
PPPPOQCi^i-li-li-li-li-lO 
rr13 ft-n l-'O d H»H»H»ft 



i-l ft 



ft n 



ft 



v-S v< v< v< v< <^ g 

go O ft ft o o 
C C rr «» M I-l 
HP H P ?r 

ft 
H 



Teat 1 — Readlntt Vocabulary 



Humber of Itcme t 33 

Task: Tha atudent liatana to tha definition of a vord raad aloud. For 
each itettt tha atudent selects from four printed words the one that 
beat fits the definition* Distractors include antonyms* contextually 
related vardSt and unrelated words« 

2« Have the atudenta had practice in supplying a word in English 
to fit a definition? 

yea» using a format identical to teat items 
yea, but using another format 

no 



Teat Zx Reading. ComPrehenaiont Sentences 
Humber of items t 23 

Task t The atudent reads a sentence and selects the vord that best 
completes the aentence* A block of four answer choices is offered 
and is located at the point in the aentence ifhcre the word is missingt 
initial position, medial position, or final position* The aentence 
completion item i&oat often occurs in the middle of the sentence* 
The average sentence length is seven worda* 

3* Have the studenta had practice reading complete sentences in 
English of at least seven words in length? 

daily or weekly 
only once or twice 
none 

4. Hsve the studenta had practice aupplylng a missing word in 
e aentence? 

yea, uaing a format Identical to teat items 

' yea, but using another format 
no 



134 

134 



Test 3i Reading Comprehension; Passages 
Mumber of items ; 18 

Taak : The student reads six brief passagea« Each passage Is folloi^d 
by two to four multiple choice questlo^is to be answerec* by the student* 
These questions Involve literal and near literal recall^ use of context 
clueai atatlng main ideas^ drawing c3ncltt8lonS| and recalling sequence* 
Paragrapha range from 5 to 14 sentences In length^ The average sentence 
Is 6 words long« 

5« Have the studenta had practice reading paragrapha In English 
that are at leaat 5 sentenced In length? 

dally or weekly 

only once or twice 

none 



6« Have studenta had practice In answering questions based on 
reading paragraphs In English? 

dally or weekly 

only once or twice 

none 



7« If atudenta have had such practice « what percentage of class** 
room questions baaed on reading paragraphs utilize the fol-> 
lowing skllla? 



less than between more than 
20% 20% and 50% 50% 



literal and near literal 

recall 
use of context clues 
stating main Ideae 
drawing conclusions 
recalling sequence 
other 




CIBS English Readlnfi Level C (Grade 2) 
SUMMARY 

(to be completed by project evaluator) 
Numbers In parenthesea refer to question numbera on preceding pages. 



I- What percent of the reading test vocabulary are ^^i" jtnts likely to 

have aeen> haard> read or used? (I) ^ (The vocabu- 

lary list contains 421 words.) 

Comments; , 



2. The language arts skills that are tested are also part of the cur- 
riculum. (2, 3, Uf 5f 6) Yea No 

Comments; 



3. Compare the kinds of questions asked In the reading test to the klnda 
of questions asked in the reading curriculum. (7) 

Percent of Percent of 

Curriculum Test 

literal and near literal recall 50 

use of context clues 11 

atatlng main Ideas 5.5 

drawing conclusions 28 

recalling sequence 5.5 

other ' ^ 0 



The kinds of questions asked In the reading portion of the test are 
also practiced In the reading curriculum In a fairly almllar pro- 
portion. Yes No 



Comment St 



4. What major skills In the English language arts curriculum are not 
represented In the test? 



5. What percentage of the curriculum does this represent? 



136 

l3G 



CIBS EapdHol Reading Level C (Grade 2) 



test/curriculum analysis 

(to be completed by project teachers) 
Readlngt Teat Vocabulary 

1* As a result of the Spanish language arta curriculum and other achool 
and non^school experlencea* yihat words on the word Hat sre atudenta 
likely to have aeen* heard* read or uaed? Review the worda on the 
word llat and circle each word that the atudenta have not been expoaed 
to* 

CAUTION ; A child knows mny more worda than are taught In school. 

Vocabulary la learned from many aources* Therefore do not 
limit your conalderatlon of studenta* vrocabulary to what la 
covered In the curriculum. 



137 

137 



CTBS Eepagol Vocabulary 



« Level C (all worde from Teflte 1-31 



a 


blbllotaca 


contar 


elloe 


gente 


fibajo 


blclcleta 


contra 


empezar 


graclaa 


fibuellta 


blanca 


corrlendo 


en 


grande 


acto 


Blanco 


cor r 16 


encontranoa 


grltar 


admirar 


blancoa 


coeto 


encontraron 


guantea 


afortunado 


bonltas 


craer 


enoj6 


guata 


fifuera 


bonltoa 


cuil 


anaalada 


guatan 


agarrar 


boaqua 


cuando 


eneeSarl 


hablo 


figUfi 


brazo 


cuatro 


eneeSar 


hacar 


ahora 


brlllante 


cuchara 


antra 


hacarsn 


aire 


brlllantes 


cuento 


era 


had a 


al 


brlnca 


culdado 


eran 


hambra 


alegre 


bueno 


cumpleaffoa 


es 


hambr lento 


algo 


caballlto 


Chivez 


ese 


haata 


algulen 


caballo 


chlca 


eaa 


iiClado 


alguaae 


cabeta 


chlco 


esconder 


hermana 


algunos 


caer 


dar 


eacondleron 


banuno 


almohada 


caja 


de 


ee cuchan 


heraoao 


alrededor 


calor 


debajo 


escuela 


hlja 


alto 


callente 


debe 


eeperamoe 


hljo 


alquller 


callado 


deberla 


eaperen 


hlto 


amlgo 


calle 


debo 


eat a 


bomb re 


amlgoa 


camlnar 


dejar 


astaba 


honbrea 


Ana 


caml6n 


del 


estaban 


bora 


ancho 


camlaa 


deletrear 


astablo 


botmo 


anlllo 


cancl6n 


desear 


ee tamoe 


horrible 


apellldo 


car gar 


desplerto 


eeta 


hotel 


aplaude 


cara 


deeplntado 


eato 


hoy 


aqul 


carro 


deepu^s 


eetoy 


hoyo 


irbol 


carta 


dla 


eatrellas 


hue CO 


arena 


caaa 


dlbujo 


estuvleras 


huevo 


arrlba 


cael 


dice 


excepto 


Iba 


ae lento 


Cata 


dlez 


famllla 


Iban 


astronautas 


cavar 


dlferente 


favor 


Igleala 


atria 


cay6 


dljo 


fellcee 


Indlo 


atrffVldo 


cena 


dlvlrtl6 


fellz 


Indies 


autobOa 


cere a 


dlvlrtlendo 


feo 


Ir 


aut<n6vll 


cerro 


d6nde 


fleeta 


Irme 


avl6n 


cludad 


dormldo 


frljolea 


Irt^e 


avlaarle 


cochlnltoe 


doe 


fuerte 


Jaime 


avlaennoa 


cog 16 


dulcee 


fuerteet 


jardln 


ayuda 


cohe te 


dura 


fue 


Jeefis 


ayudar 


comenz6 


duml6 


galopar 


Joa^ 


ayudarlo 


comer 


duro 


galletae 


joven 


barco 




e 


Garcia 


juegan 


ballar 


como 


edad 


gatear 


jugamoa 


bateador 


comprar 


el 


gatlto 


jugar 


Beatrlz 


con 


el la 


gato 


ji:guete 


bebi 


conchltae 









138 



138 



n>O^OrrOO O^i-IO Ptt P n>Q>(0OpS^& ON OQ OH**nQ> CD(P O^m* oq> n> cr 

OB 

'Q*Q*Q*Qx^ Tf^;. ti*o*tf*o*o*o*o*o*o*o*o*o*o*o*o o o o^oDp o 00 o p0 pod ppp P00 g 8 3 8 a 

D h-h-h-H»rtfTfToa nnn Ph- h-oq f3ui-ii-io*tt<00H*aQaQf-ifB(P<ttS PiP# PlOo o a> B 3 P o 

Oh* p>n> P pi-lOA» 0*0 rr A» H> 

A» i-l n t)» P 

3! C 



CO 



y^*.*. '*g*oCi-«»ooo>n>fti-(i-(a.o. 






ccccc c:c;i-«i-«oooooh*i-'H-i-iH»R»R»R»H»(5 n>(bn>n> n>f^n>n> 
n>np n>OH*i-ipo.oo'cfi>«»fi>pn>n>n>$«onartt-if-if-i^^ 

Oft^O*OQ0 n>)-^Oi-|^^rrrrrraQHi-'l3unQ>(k»f-|OH*n>aa 

on>i-i& f-it)»QDnEDt)»ofi»fi» p etpo.0^0 o^Nn>n> 

^ p«»rt p p5p»o.oo o.PtPt 

000 @0 o* nOn t^OA» 

n OQ IB O OQ n 

O 



CD 



000 



O* O* c| *t3 
O O A» A» O 
< H 



o n> n> n> 

n D D e M 

t-| ||> ft} 

A» A» O 



P P «» 



cc:c:ccooooooQOO^ooH*H*n>n>n>n>n>n>n>A»A»A»cc=ooo 

H»H»< 5*0*0 fl-O ^OOOPO rrQ^H*auP»Ofi»p 

^ " ■ ■ ■ ' - O^P9 



o o n> 



n> o^ n A n> n> 
n A» p i-i f-i 

A» 00 



o 
o 



O A» 



P 



O 



< n 
p 



gp n> 

n> p 
g " 



StbtbPPPHPP 
c-k c-k c-k H« 00 

n>oofl>pp i-ip 

O p 



p P o» «o 
f3u P o 



ft 
o 



^ i i < i 

P P p P 

O H» P 

p 



c c 
p p 
O o 



1^^ 

p p 



CHHHHi-(i-«i-«i-«i-«OQOO 
O^ O ^ O OB i-< 



p P 
o 



Test 1 Vocabularlo de Lecture (Reading Vocabulary) 



number of Items t 33 

Task: Ttie student listens to the definition of a mrd read aloud* For 
each Item^ the student selects from four printed words the one that best 
fits the definition. Dlstractors Include antonyms , contextually related 
vords^ and unrelated words. 

2* Have the students had practice In supplying a word In Spanish 
to fit a definition? 

yfi^i using a format Identical to test Items 

yes> but using another format 

no 



Teat 2t ComPrensl&n de Lecturat Draclones (Reading Comprehension t 
Sentences) 

Number of Itemst 23 

Task : The student reads a sentence and selects the word that best 
completes the sentence* A block of four answer choices Is offered and 
Is Iccrted at the point In the sentence iriiere the word Is missing t 
Initial position^ medial position^ or final position* The sentence 
completion Item most often occurs In the middle of the sentence* The 
average sentence length Is seven words. 

3* Have the students had practice reading complete sentences In 
Spanish of at least seven words In length? 

dally or weekly 

only once or twice 

none 



4* Have the students hsd practice supplying a missing word In 
a sentence? 

yes> using Identical format as test Items 

yfis* but using another format 

no 



140 

i40 



Test 3: ComPrensl&n de Lectura: Pasaies (ReadlnR Comprehension: 



Passages) 



Kumber of Items ; 18 

Task ; The student reads six brief passages. Each passage Is followed 
by two to four multiple choice questions to be answered by the student* 
These questions Involve literal and near literal recall^ use of context 
clues, stating main Ideas, drawing conclusions, and recalling sequence* 
Paragraphs range from 5 to 17 sentences In length. The average sentence 
Is 8 words long. 

5f Have the students had practice reading paragraphs In Spanish 
that are at least 5 sentences In length? 

dally or weekly 

only once or twice 

none 



6. Have students had practice In answering questions based on 
reading paragraphs in Spanish? 

dally or weekly 

only once or twice 

none 



7f If students have had such practice^ what percentage of class- 
room questions based on reading paragraphs utilize the fol- 
lowing skills? 

less than between more than 
20% 20% and 50% 50% 

literal and near literal 

recall 

use of context clues 

stating nialn Ideas 

drawing conclusions 

recalling sequence 

other 



141 



CTBS E&paSol Reading Level C (Grade 2) 



SUHHARY 

(to be completed by project evaluator) 



Nufflbers in parentheses refer to question numbers on preceding pages « 



1« What percent of the reading test vocabulatT are students lUcely to 
have seen* heard* read or used? (I) X (The vocabu- 
lary list contains 462 «ords«) 

Comments t 



2« The language arts skills that are tested are also part of the cur^ 
rlculum# (2, 3, 4, 5» 6) Yes No 

(k)mmente: 



3« Compare the kinds of questions aaked in the reading test to the klnda 
of questions asked In the reading curriculum* (7) 



Percent of Percent of 



Curriculum Teat 

literal and near literal recall 50 

use of context clues U 

atatlng aaln ideas 5*5 

drawing conclusions , 26 

recalling sequence 5*5 

other 0 



The kinds of questions asked in the reading portion of the test are 
also practiced in the reading curriculum in a fairly similar pro- 
portion* Yes No 



Commente; 



4. What major skills in the Spanish language arts curriculum are not 
represented in th^ test? 



5* What percentage of the curriculum does this represent? 



142 

U2 



CTBS Spanish or English Mat h Level C (Grade 2) 



test/curriculum analysis 

(to be completed by project teachers) 



Math Battery 

1« What percent of math curriculum Is devoted to computations? _ 

2« What percent of math curriculum Is devoted to math concepts* 

applications* and story problems? _ 

3« Do students have, adequate vocabulary In the language In which 
they are tested so they understand directions and word probl^s? 
yes no 

Test 4; Computacl6n de Matemfit leas /Mathematics Computatio n 
Number of Items ; 28 

Task ; The student performs a computation and chooses the correct 
answer from the four that are provided* The computations consist 
of addition^ subtraction^ and multiplication* 

4« What percent of the computations In math curriculum are 
represented by; 

addition 

subtraction 

mult Ipllcat Ion 

total 100% 

5« What percent of the additions performed In the math cur* 
rlculum ^re represented by: 

horizontal addition 

vertical addition 

total 100% 

one^dlglt addition 

two-digit addition 

three-digit addition 

total 100% 

additions requiring carrying 

additions with decimals 



6. What percent of the subtractions performed In the math cur- 
riculum are represented by; 

horizontal subtraction 

vertical subtraction 

total 100% 




one-dlglt subtraction 

two-digit subtraction ^ 

three-digit subtraction 

total lOQZ 

7* What percent of the multiplications performed In the math 
curriculum are represented byt 

horizontal multiplication 

vertical multiplication _ 
total loo; 

one digit multiplication 



Test 5; Conceptos v Anllcaclones de Ms tem4t leas/Mat hematics Concepts 
and Applications 

Mumber of Items t 25 

Taskt Thfi student listens to a problem or a question read aloud and 
selects from four possible answers* 

8* Following is a list of the skills Included In this test» with 
the number of Items devoted to each skill noted In parenthesis* 
Check In the space provided iriiether each skill Is covered In 
the curriculum^ and decide how many total Items this repre- 
sents* 

^ of 25 Items represent skills covered In the curriculum* 

Yes No 

addition story problem (4) , . , . 

equating number word to a set of Items (1) — 
counting by more than 1 (2) 

liquid measures (1) -m— 

adding money (2) 

subtracting money (1) 

applied numeration (days of week) (1) 

telling time (2) 

simple fractions (3) ™ ™— , 

single digit horizontal addition t addends , , , , 

precede sum (1) 
single digit horizontal addition > sum 

precedes addend (1) 
setting up a story problem for addition (1) ^.„_^ — 
setting up a story problem for subtraction (1) , 
application of addition to a ruler*-llke 

scale (1) 

application of addition to time (1) — 
missing subtrahend (1) 

(Caution ; do not simply add checks* For each Item checked add 
the number In parenthesis at the end of that line.) 



144 



CTBS Spanish or English b j[at h Level C (Grade 2) 

SlJMKARY 

(to be completed by project evaluator) 
Nuidiers In parenthii^fies reier to question numbers on preceding pages. 

1. Compare curriculum to test. (!» 2) 

Percent of Percent of 
Curriculum Test 

Computations 53 

Math concepts t applications » 

story problems 47 

Match Is appropriate? yes no 

Comments; 



2. Students have an adequate vocabulary tor the math test? (3) 

Yes 

No 

Comments $ 



145 

lis 



3f Compare curriculum to testf (4* 5* 6* 7) 



Percent of Percent of 
Curriculum Test 



addition 36 

subtraction 36 

multiplication 28 

total 100 100 



horlzontill addition 42 

vertical addition 60 
total 100 100 



one digit addition 20 

two digit addition 60 

three digit addition 20 

total 100 100 



addition requiring carrying 40 

addition with decimals 10 



horizontal aubtractlon 40 

vertical subtraction ^ 60 

total 100 100 



one digit subtraction 0 

two digit subtraction 80 

three digit subtraction 20 

total 100 100 



horizontal multiplication 100 

vertical multiplication 0 

total 100 100 



one digit multiplication 100 

4- In the math concept test* out of 25 or % of the test repre- 

sents Items that the atudents hava practiced In the curriculum^ (8) 

Comments J 



146 

llG 



5. What major skills from the math curriculum are not represented by the 
test? 



6- What percentage of the curriculum does this represent? 



147 ^ 

117 



COLLECTING DATA 



Major Content IieMs 

7-A. DATA COLLECTION PROCEDURES 
7-Bi DATA RECORDING FORM 



(checklist) 

(section 7 



lis 



7» COLLECTING BATA 



Data collection Inaludea obtaining student background Informatlont 
gathering teacher opinions^ observing claaaroon operation* and a variety 
of other actlvltleai but the focus of thla section la the administration 
and scoring of tests and the recording c£ the scores* Of all the topics 
addresaed In this manual* data collection is the only one with no majort 
unresolved theoretical issuea* To obtain clean data* all that li required 
Is to follow simple, widely known procedures* Yet data collection prob- 
lems are a major reason for the lack of credibility In educational evalu- 
atlona* 



Key Problems 

Teatmg procedures * Adequate tea ting procedures simply 
require following the publlaher's Inatructlons exactly ■ and 
making sure that pre* and posttcatlng conditions and procedures 
are Identical s While this Is not difficult, It does require some 
effort on everyone'a part* ttDSt problems are probably dua to a 
lack of underatandlng of the Importance of careful data collect 
tlon* See Item 7-A* 

Teat acorlng and data recording * Both scoring and recording 
sre subject to clerical errors* but theae errors can easily be 
hiiici to an acceptable level througfh adequate care and accuracy 
checks* Hbre difficult to deal with are scoring procedures that 
require the scorer to make Judgments* (See Item 7*A*) The major 
problems In recording data are to provide all the easentlal 
Information In a manageable format* (See Item 7-B)* 



150 



R elated Issues 

Training testers . For experienced testers using a familiar test 
It Is sufflcl^t to bring the group together briefly within a few days 
of the beginning of testing to review the tests and testing procedures. 
For new tests or Inexperienced testers, each tester must practice admin-* 
Isterlng the entire test under the supervision of the evalnator. 

Testing on appropriate dates . Testing should be done within a few 
days of the same date each year. For norm*ref erenced evaluations the 
testing should be within a week or two of the time that normative data 
were collected by the test publisher (or local district). Tests must 
also be spresd out over days so that the burd^ on the students Is not 
so great ss to lower scores. Pre- and posttestlng must follow similar 
schedules • 

Recording data for longitudinal evaluations . A data recording form 
that works treU for a single fall^to-sprlng evaluation may not be suitable 
for following student progress over several years. Stuclent attrition, 
regrouping of classes each year, snd the total number of scores Involved 
all present problems. Appropriate Individual student record forms may be 
the best solution. See Itejn 7-B. 




Data Collection Procedures 



Outline 



Assembling the students 

• Similar testing conditions for all treatment and comparison groups 
should be utilized. The time* place^ and date of test '.dmlnlstra^ 
tlon should be considered. Technical manuals for test administra- 
tion often contain testing procedure recommendations (l*e.^ avoid 
afternoon testings or testing on Monday and Friday)* 

• Distractions should be minimized. Avoid testing In the hall^ or 
In the cafeteria as lunch Is being prepared. 

o Coordinate testing efforts with district testing or assessment 
policies and procedures. 

• Consider teaching test' Mng skills to students.. This Includes 
acquainting students v.^ test formats^ etc.» NOT teaching to 
the actual test. 

• Plan for make-up testing 
Administering the test 

• Identify testers. If teachers do not speak the appropriate lan- 
guage » Identify alternative testers. 

• Conduct Inservlce training for all test administrations. If 
aides and parents will be used In testing^ more Intensive train- 
ing will be required for them. The Items on the list below should 
be addressed: 

- Familiarity with materials 

* Clarity of presentation 

- Adherence to guidelines and time limits 

- Control in the classroom 

^ Attention to physical conditions (e.g.^ seat spacing) 

- Practice for Individual nesting 

- Correct choice of testing dates (e.g.* normlng dates) 

* The need for the Inevitable **flll-ln** of absentees 

• Clearly define roles and responsibilities of testors. Inservlce 
training and detesnmlnatlon of roles and responsibilities should 
be assertively coordinated by the project dl;:ector. 

Scoring the test 

• Train test scorers. 

i0 Scored tests should be spot checked by someone else. 

• Check Interrater reliability. 

Recording scores 
(See Item 7-B) 



153 




7^h 

Data Recording Forms 

Recording the scores Is the final step In the data collection 
process but^ to ensure that the scores will be usable » the details of 
recording should be worked out well before pretest time. Where a com** 
merclal scoring service is used* the school evaluator may have little 
control over the recording process « but If the school el<?vts to do Its 
own scoring or wishes to transfer scores from computer printouts to a 
more convenient form^ the evaluator must consider two Important Issues : 
the accuracy of the data^ and the details of the data recording forms. 

Cbpylng scores accurately onto data forms Is not a complicated 
problem for small-scale local studies^ but It must not be overlooked. 
Even the most conscientious recorders make errors » and all data forms 
should be carefully proofread^ preferably with one person reading aloud 
while A second person checks the scores. 

The details of the data forms might appear co be of little Impor- 
tance^ but Ir many school districts the way In which data have been re- 
corded virtually precludes any reasonable analyses, it Is not possible 
to prescribe a standard data format because school requirements vary so 
widely^ but It Is possible to state two general principles which must be 
observed. Firsts all scores must be completely Identified^ and second^ 
scores must be arranged In away that facilitates analysis. Sample data 
forms Illustrating these principals are attached. Specific Issues related 
to the use of such forms are discussed below. 



155 152 



Considerations for data recording forms 



1* Host sets of scores require more than one page. The psge number 
Identifies each sheet and the "number of pages" helps make sure 
no pages are mls&lng. 

2. Every sneet of paper should have a name and date to Indicate 
who filled In the numbers In case any questions arise In the 
future. 

3* The group for which dsta are recorded should be clearly Idf^ntl"* 
fled at the top of the page to simplify the retrieval of that 
group's data from a large data base. 

4. The page should be arranged so that it can be photocopied vlth<^ 
out the students' names. This permits wide use of the data for 
research purposes without compromising student privacy. 

5* It simplifies analysis greatly to have only one test (pre an d 
post) recorded on each sheet » provided the rules for listing 
students (see points below) are followed. The complete 

name of the pretest and posttest (taken exactly from the test 
booklets and Including publication date ) must be listed. This 
point Is widely neglected. 

6. Identifying students and organizing their name^ efficiently are 
the most difficult problems In recording student data. Where 
evaluations are only for one year and are based on fall and 
spring testing* the problems can be solved with a little effort 
and care. But where students must be followed over several 
yearst there Is no simple solution since students come and go 
from projects* end groups are reorganized every year. The sim- 
plest rule Is to make sure that the posttest scores are all 
entered on the same sheet of paper as the corresponding pretest 
scores. This at least eliminates the problem of the evaluator 
trying to find each student's name on two lists. 

7* A second rule for listing student names Is to establish a 
standard ordering of the names* and stick to It for the life 




of the evaluation and for all tests that are used. If a st ',ent 
moves or falls to take some of the tests» then the appropriate 
entries are blank, of course^ but he or she should not be elimi- 
nated from the list. If new students enter the program^ their 
names should be added to the end of the lists for all tests^ 
even those for which no data will be entered. In addition to 
the obvious reduction In confusion^ there are some practical 
advantages to this procedure. For example » a master form can 
be prepared with only the students' names and Identification 
numbers filled In^ and the forms can simply be duplicated when 
new tests are given. It also makes comparisons or correlations 
between any two sets of scores relatively easy because any two 
forms can be laid side by side and the corresponding names will 
line up correctly. If there Is a compelling reason to change 
the order of student names xn the middle of a project^ then 
either all forms should be changed^ or a double set of forms 
(old and new order) should be maintained. 

8. A rule should be established for recording nai^s. "Caldwell^ 
D. E.'^ should never become "Danny Caldwell" on a second list. 
The simplest procedure is to allow plenty of space and to spell 
out first names and middle Initials (e.g.^ Caldwell^ Daniel E.). 

9* Each student should have an ID number that completely Identifies 
him or her. The example In Figure 4 uses a one-dlglt experlmen-- 
tal condition number^ a tvo-dlglt group or class Identification^ 
a one^dlglt sex code^ and a two-digit student number. In some 
evaluations » other codes (Including letters) can be used» but 
careful consideration of the situation Is necessary In order to 
permit any desired grouping simply by ID number. 

10. A page should have some reasonable numbe " entries^ probably 
20 or 25. For some Inexplicable reason^ numbers like 27 and 33 
are popular^ and often the number of entries varies from page to 
page. Unnecessary complications like this help to make the 
statistician's life miserable. 



157 

I 54 



lit Test dates are criticalt especially in norm-referenced evalua** 
tions* If all students listed on a page have their pretesta in 
one day and all are later posttested in a single day* then the 
test date column is not really necessary* However^ this ia 
usually impossible to predict at the time the form ia made up* 
so the columns should be there in order to permit identifica*^ 
tion of aake-up teats and late entries into the program* 

12* Pre* and posttest scores should^ in general* be in adjacent 
columns t rather than pairing each pretest raw score with its 
standard score^ percentile score* etc** followed by each poat* 
test score and its transformations* This greatly simplifies 
the mechanics of analysis; comparisons are neorly always made 
between pre- and posttest score of the same type* 



158 



io5 



|er|c 



School 
Glass 
Year 
Grade 



Cover Sheet for Data Recording Forms 





Blodata 


DeiQo Data 


Classi- 
fication 


Treatment 


student 


Age 
In 

S^pt 


Yrs 
in 

U.S. 


Lan- 
Ruajte 


Birth 






Yrs 
in 

Pro- 
gram 


Read- 
ing 




Teach 















































15? 



156 



ANALYZING THE DATA AND REPORTING THE RESULTS 



Major Content Ttpms 

8-A, DATA ANALYSIS CHECKLIST 

8-B, REPORT-WRITING CHECKLIST FOR BILINGUAL-PROGRAM EVALUATORS 
8-C. SAMPLE DATA-REPORTING TABLES 



8# ANALYZIKG THE DATA AND REPORTING THE RESULTS 



Data analysis reporting Is a complex undertaking that requires a per*' 
son with adequate training. If such expertise Is not mrallable In the 
district^ outside assistance should be sought. This section Is written 
with the assumption that a competent evaluator will direct the analysis 
and focuses only on a tew of the most common deficiencies encountered In 
educational evaluation reports. 

A widespread problem In analyzing data Is the failure to tl6 the 
analyses to the overall evaluation design* Many analyses simply do not 
answer the basic questions posed In the reports. Simple analyses that 
follow directly from the questions posed should be used* Sophisticated 
statistical approaches (e.g.» multiple regression techniques)^ are usually 
not warranted and most smaller districts probably do not have the resources 
to employ such designs. Especially important » and widely Ignored ^ is a 
careful examination of the data for obvious irregularities. Efforts in 
report writing should be focused on providing complete^ but concise tnfor^ 
matlon rather than elaborate diagrams and exhaustive sets of uninterpreted 
data tables. 



Key Problems 

Grouping students for analysis . One of the major criticisms 
of bilingual program evaluations is that they lump together a 
wide range of students who have different characteristics and who 
receive a variety of (poorly described) services. Unless the 
reader of the report understands the characteristics of the 
students and the treatments they receive » discussions of achieve-' 
ment Impacts will have little meaning. At an absolute minimum « 
students must be grouped for analysis according to language 
proficiency in both languages and according to the subjects they 
study (e.g.» English reading^ target language reeding). If there 
are major differences in amount or type of instruction received by 
different students^ then additional groups will be needed. (See 
Item 8-A.) 

Presenting complete, convincing arguments . It is extremely 
rare to find sn educational evaluation report that presents a 
complete argument for the ex is tence of achievement Impacts . 
Truly convincing reports are virtually unknown* 1fet» presenting 
a reasonable argument is not difficult. The reader needs to know 
(a) the student characteristics^ (b) the program goals» (c) the 
program features that are designed to achieve the goals» and (d) 
the results in temns of student scores. "Results/* of course^ must 
Include the exact tests and procedures used. Finally^ the relation 
between the treatment and the results must be summarized for the 
reader. These evaluation report bsslcs are covered in Sections 2 
through 8 of this manual and are summarized in Items 8**B and 
8-C. 



162 

150 



Related Issues 

Floor and ceiling effects . Floor snd ceiling effects are pervasive 
problems in bilingual-program evaluations. A minimal cheeky for multiple- 
choice tests is to be sure that mean classroom or school raw scores are no 
lower than 25 percent of the it«ns correct for four-choice tests, 33 per- 
cent for three-choice I and so on. Mean raw scores should not ^ceed 75 
percent of the total possible raw score on any test. Outside of these 
values^ the likelihood of floor or celling effects, respectively, should 
be noted in the report. 

Grade equivalent scores and other scales , ttever use grade equivalent 
scores for any purpose. Use normalized stsndard scores (preferably NCEs) 
for all computations and calculations of Impacts. Report pre- and post- 
test performance to general audiences in percentiles. 

Statistical versus educational significance . Statistical significance 
says nothing about the size or importance of a program impact and should 
not be discussed in reports to general audiences. The real issue is whe- 
ther the impact represents a noticeable reduction in the achievement prob- 
lems to which the program is addressed. 

Single*^ear versus longitudinal analysis . Most bilingual program 
evaluations are restricted to the effects of a single year. Such evalu^ 
ations are not convincing. It is necessary to demonstrate that there is 
continuing year-to-year progress toward program goals. 

Level of precision of the evaluation . Throughout this manual y the 
lack of precision of real-world educational evaluations has been empha- 
sized « and the evaluation report should make this problem clear to the 
reader. On the other hand, if a program truly improves student achieve- 
mentt this fact will show up clearly over a period of a few years in care- 
fully conducted evaluations. Thus, whilA no single-year evaluation can 
be completely convincing, consistent results and trends over years will 
eliminate most doubts. 

Executive and other s"mmflr^*^ft - The executive summary may be the 
most Important part of the report since it will be the most widely 
read. The summary should cover all of the major r^aport headings but 
should emphasize results and recommendations. (See Item S-B.) ^ive to 
six pages should be enough. A copy of the ^ecutive summary should be 
written In the language (or languages) of the project parents and 
distributed to them. 



163 




8-a 

Data Analysis Checklist 
Outline 

I* General principles 

A. Analyze data both by Individual years for short-term goals and 
cumulatively for long texm goals. 

B. Separate data according tc language proficiency groups. 

C. Separate data further according tr> Instructional treatment. 
II. Preparation (applies to most evaluation designs) 

A. Convert raw scores to standard scores (preferably nosnnallzed 
standard scores such as NCEs). Use these scores for all 
analyses. 

B. Separate out those students uith both pre^ and posttests. 

1. (compute means and standard deviations. 

2. Plot the distributions of pretest scores. 

3. Plot the distributions of posttest scores. 

4. Plot the joint distribution of pretest and posttest scores. 

C. For students uith pretest scores only: 

1. (compute the mean and standard deviation. 

2. Plot the distribution of scores. 

D* For students with posttest scores only. 

Save the scores for student files and for use as next years 
pretest scores. 

III. Check for Irregularities In the data: 

A. Floor or celling effec'cB 

B. Large changes In standard deviations from pretest to posttest. 

C. Low correlations between pre-^ and posttest scores, or Irregular 
joint distributions. 

D. Differences between students who took the posttest, and those 
who dropped out* 

E. Look for any other features of the data that strike you as 
strange, and be sure that you can explain tnem. Ideally, item 
data should he examined. 

IV. Apply the statistical or other procedures relevant to the particular 
evaluation design In use. 

Be sure that your analyses are relevant to the questions you are 
trying to answer. 



H is I 

ERJC 



Report-Wrlclng Checklist for Bilingual 
Projtram Evaluacors 



This checkllsc presents an outline that can he followed In preparing 
an evaluation report. The '^Section Reference" to the right of each topic 
refers to the section of this manual that deals uith the topic. 

The purpoae of the outline Is to suggest one logical order of present 
tatlon o1 topics. There are* of course* other ways of organising the re^ 
port. A second function of the outline* however* Is to provide a compre- 
hensive reminder to the program director and evaluator of the kinds of 
Infosnnatlon that may be Included In a report. In the PIP field test eval- 
uation reports* sections on student selection criteria and procedures* 
and Interpretation of findings* were frequently not included. Such infor- 
mation should be Included in a report to be considered complete. 



167 




Report"Wrltlng Checklist for Bilingual Education 



Program Evaluators 



I* Executive Summary 

A. Summary of findings 
B* Recommendations 

II. Program Overview and Background 

A. Brief program description 

B. Major goals 

C Context of program 

D. Program history and district needs 

E. Target student needs 

III. Description of Evaluation 

A. Purposes and audiences 

B. Evaluation staff and roles 

C. Deslgna 

1. Questions addressed 

2. Comparison standards 

3. Constraints and queatlons not 
addressed 

F. Continuity with previous and future 
years' evaluations 

IV. Parent and Community Component 

A. Goala and objectives 

B. Description of activities 

C. Process ev**^"atlon 

1. bfeasures used 

2. Data collection procedures 

3. Analyses and results 

4. Interpretation 

5. Recommendations 



Check 

Section when done 

8 

8 

8 



3, 8 
3-A 

2 
3-A 
3-A 

5 



4 

1 

4 
4 
4 



4 



4 



App. B-3 
App. B-3 
App. B-3 
App. B-3 
App. B-3 
App. B*3 
App. B-3 
App. B-3 
App* B-3 



16B 

1C3 



D. Outcome evaluation App. 3-3 

1* Measures used App. 6-3 

2. Data collection procedures App. 6*-3 

3* Analyses and results App. 6-3 

4. Interpretation App, 6-3 

5 • Recommendations App . B*"3 

V. Staff Development Copponent App. 6-2 

A. Goals and objectives App. 6-2 

6. Description of activities App* 6-2 

C* process evaluation App. 6-2 

1* Measures used App. B-2 

2. Data collection procedures ^p. &-2 

3* Analyses and results App. 6^2 

4. Interpretation App. B-2 
5 • Recommendations App . 6-2 

D. Outcome evaluation App. B-2 

1. Measures used App. B-2 

2. Data collection procedures App. 6-^2 

3. Analyses and re^sults App. 6-2 
^. Interpretation App. 6-2 

5 . Recommendations App . 6^2 

VI. Students 5 

A' SeJ ^ ctlon criteria ano procedures 5 

1. Legal requirements 5 

2. Make-up of program classrooms 
and definition of '^project 

student" 5 

3. Criteria for selection o£ students 

of limited English proficiency 5 

a. Tests and cutoff scores used 5, 6 

b. Rola uf teacher judgiuent 5» 6 

c. Role of parent (wishes 5^ 6 

d. Method of combining criteria 5, 6 



169 



164 



4* Criteria for selection of students 



proficient li^ English 5 

a. Criteria used 5 

b* Method of application of 

criteria 5 

5* Exit criteria and follow-up 5 

6. Student turnover 5* 9 
7* Effects of selection criteria 

and procedures on evaluation 

dealgn 4* 3 
6* RecomrendatloDB for improvement 

of entry/exist criteria and 

procedures 5 

B. Description of students 5 

1. Characteristics at beginning 

of year 5 

a. Language proficiency 5» 6 

(1) English 5* 6 

(2) Mon-Engllsh language 5» 6 

b. Achievement levsl 5^ 6 

c. Biographic data 5-A 

(1) County of birth 5-A 

(2) Years of residence In 

U.S. (If applicable) 5-A 

(3) Home language use 5-A 

(4) Previous educational 
experience 5*A 

(5) Other 5-A 

d. Demographic data 5-A 

(1) SES 5-A 

(2) Other 5-A 

2. Current ^perlence 
characteristics 

a. Attendance S-^A^ 9 

b. Key treatment variables 3 



170 

16S 



(1) Reading In English 

(2) Reading In non-English 
language 

(3) Second language 
Ins true t Ion 

(4) Participation In other 
special projects 



3-C 



3"C 



VII . Instructional Component 

A. Goals and objectives 2 
!• Areas to cover 2 

a . Ach levemen t ? 

h. Affect 9 

2. Breakdown of goals and objectives by 2 
a* Grade level 2 
b* Language proficiency group 2 
c* Su^ ject area 2 
d* Language of subject area 2 
e* Number of years of partici- 
pation In project 2 

3* Time frame 2 

a* Short-term goals 2, 4 

b* Long-tesnan goals 2, 4 

4* Explanation of bases for 

establishing criteria for success 2 

5. Follow-up goals 2 

B. Description of Instruction 3 

1. Program" lev el Instructional 

features 3-A 

2. Class room- level Instructional 

features 3-B;3-Dr 

3* Reading Instruction 3-C 

4. Level and extent of description 3 
a. Describe Instruction at 
appropriate level (Indlv. 
groups, classroom) depending 

on homogenlety of Instruction 3 



171 



16G 



Longitudinal description 3 
5* Characterlatlca of inatructlonal 

staff 3 
6. Description of treatment received 
hy conparlaon atudenta or normlng 
group 3 

C. Procesft evaluation 

1. Teata and measurea uasd 6 

2. Data collection procedurea 7 
3* Data analysla and results 8 

4. Interpretation of flndlnga 8 

5. Suimnary of reconmendatlona made 

to Intprove instruction 8 

D. Outcome evaluation 

1* Teata and mea'^ures used 6 
a. Relation of meaaures to goals 2» 6 

Deacrlptlon of meaaurea 6 

(1) Language 6 

(2) Content 6 
(a) match between 

content of test 
and content of 

curriculum 6*C 
(h) cultural and llngula- 

tic approprlateneaa 6 

(3) Technical propertlea 6 
(a) validity and 

reallhlUty 6 
(h) floor and celling 

effecta 6 

(4) Fonnt level* edition 6 
2. Data collection procedurea 7*A 

a. Explanation of which atudenta 
were teated In which language(a) 
and rationale 3* 5 



172 



167 



b* Qualifications and training 
of testers, observers. Inter- 
viewers 7 

(1) Training 7 

(2) Language skills 7 

(3) Familiarity with students, 
parents, etc. 7 

c* Schedules of data collection 7 

d. Scoring and recording 7-B 
3* Analysis and results 8 

a* ^planatlon of scales used 6, 8, 8^A 

b- Floor and ceiling effects 6, 8, 8-A 

c. Unit of analysis 8, 8-A 

(1) By language proficiency 

group 5, 8 

(2) By treatment group 3, 8 
d* Explanation of Irregularities 7, 8 

(1) Attrition 8 

(2) Bad data 7, 8 

e. Scope of analysis 8 

(1) Relation to previous 

years 4, 8 

(2) Plans for future 

continuity 4, 8 

f- Tables of test results 8-C 
4* Interpretation of findings In 

light of: 8 

a. Short-term and long-term 

goals 2, 8 

b* Degree of progiam Imple- 
mentation 3f 8 

c* Specific Instructional treat- 
ment 3, 8 

d* Teacher characteristics 3, 8 



173 



ef Similarities and differences 



between treatinent group and 

compsrlson (or norm) group 4* 8 

f* Kmhber of years of student 

psrtlclpstlon In project 5t 8 

gf Hatch between tests and 

curriculum 6* 8 

h* tlndtstlons of tests 6^ 8 

1* Dats collection procedures 7* 8 

5* Recommendations for Improvement 

af Instruction 8 
bf Evaluation 8 



174 

1 C,9 



SAMPLE DATA REPORTING TABLE 



The use of oata reporting tables enables the evaluator to provide 
a great amount of information in a concise and easy to read form. For 
the reader^ tables provide an easy means of grasping the quantitative 
info3nmation a report has to offer. 

In order to display data effectively^ there are ^ number of infor*- 
mation items that should be included. A table should identify what 
information is being provided^ for what group or subgroup^ and for how 
many participants (N). 

In identifying the test used^ the test edition year^ form^ language^ 
and level should also be specified. It ie also advantageous to report 
the number of items the test contains per subtest plus the date the 
test was administered* When reporting numerical data* it is necessary 
to identify pre-* and posttest data* provide means^ standard deviations^ 
and gains. 

A typical error is the failure to report the type of scores. The 
table should indicate whether scores are percentiles* standard scores* 
or raw scores. 

Two sample data reporting toxms are provided* one for reporting 
raw scores and the other for standard scores and percentiles. Etaw score 
tables may be more useful to teachers who are familiar with the test. 
Percentiles and standard scores may be move useful for reporting to 
program administration and program monitors. The tables can be adapted 
to suit the needs of individual programs* 



175 



SAMPLE DATA RECORDING TABLE 



Program: 

Language Classification of Group; 

Grade; 

Subject(s)! 



Test Description 





Name 


Language 


Subtest 
(If used) 


Level 


Form 


Edition 


Norms Used 
(If any) 


Testing 
Dates 


Pretest 


















Posttest 



















Subtest(s) 


N 


Standard Scores (or NCEs) 


Percentile Equivalents 


Pre - Post Change 


Prel 


:est 


Posttest 


AvR. N 


Mean 


S.D. 




r re test 


Posttest 


NCE Units 










Mean | S.D. 
















1 









Avg. = Average dally enrollment 

N = The numbei of students who had both pretest and posttest 



SAMPLE DATA RECORDING TABLE 



Program: 

Language Classification of Group: 

Grade: 

Subject(s)j 



Test Description 







Name 


LanJSuaJSe 


Subtest 
(If used) 


Level 


Form 


Edition 


Norms Used 
(If any) 


Testing 
Dattis 




f'retest 




















Pdsttest 







































Rav Scores 



Subtest(s) 


NOf of Items 


N 


Pretest 


Post test 


Pre " Post Chanae 


Avg. N 


Mean 


S.D. 


Mean 


S.D. 


Raw Score Gains 



































Avgf ^ Average dally enrollment 

" The number of students who had both pretest and posttest 



173 



171 



Appendices 



APPENDIX A 
HOW BIG ARE ACHIEVEMENT GAINS 



i 



I jtQW Big Are Achievement Gains? 



In order to be able to set realistic goals and to Interpret gslns 
made In bilingual programs^ It Is useful to have In mind the galnii ordl-* 
narlly made by English** speaking etudenta In traditional all-English pro- 
grams and In special programs such as TltlQ I. The size of achleveme.;t 
gains resulting from Bpeclal educational programs are generally small. 

However^ the differences between bilingual programs and traditional 
or other speclsl all^^Engllsh piojecta raise aeveral Issues that must be 
taken Into account In conalderlng the size of achievement gains. First^ 
gains must be meaaured In the atudents' prlmsry language as well as In 
English. This Is Important since much of the Instructional tlme« at 
leaat In the early stages^ Is devoted to teaching content through the 
primary language and developing primary language skills. Second^ since 
reading Inatructlon In the second language may begin after reading akllls 
are developed In the primary language^ the grade level when English read- 
ing gains can be expected depends on the curriculum of the progrcm. Thlrd^ 
It may be Inappropriate to speak of gains relative to the normlng popula-» 
tlon since English norms are not appropriate comparison standards for stu- 
dents of limited English proficiency^ snd no adequate norma for languages 
other than English are available at the time of printing (see Chapters 
4 and 6) . 

Normal Clasaroom Grmyth In All-Enallsh Traditional Programs 

In order to discuss the size of achievement gains ^ It Is necesaary 
to have a meaningful atandard or acale of measurement. For the purpose 
of this discussion f let us uae the expanded standard score scale from a 
standardized test to provide a numerical score for general reading aklll 
In English. A numerical score requires aome frame of reference to give 
It meaning^ and we can supply a uaeful frame of reference by Identifying 
the ranges of reading scores for national norm-group students at various 
age levels. Figure I Illustrates four norm-group distributions showing 
the range of English reading scores (lOth percentile to 90th percentile) 
at the beginning and end of second grade and the beginning and end of 



181 



sixth grade. Thea« percentile acalea vll ])e uaed aa our acalea of mea- 
aurement^ and they can eaally be converted to KCE unlta.^ 

Figure I llluatratea the gain that 20th percentile norm-group atu- 
denta ordinarily achieve during a achool year. (Thle percentile level 
waa choaen for llluti. ration alnce many Title I and Title VII atudenta 
acore In thla range.) The white bar at the left of Figure 1 representa 
thia a>nount» which we will refer to aa ^^normal grovth^' for 20th percen** 
tile aecond gradera. Also fihown In Figure 1 la the amount of gain that 
conatltutea "normal growth^^ £cr 20th percentile norm-group atudenta 
at the sixth-grade level. It can eaally be aeen that the alzea of theae 
galna vary acroaa grade levela^ a point that will be further dlacuaaed 
below. 

We are now In a poeltlon to compare atudent galna with percentile 
levela. For exaiLple^ It can be aeen from Figure 1 chat a Title X atudent 
who atarted the aecond zraca at the 20th percentile would have to gain 
nearly twice aa many polnta aa the 20th-percentlle atudents with normal 
growth In order to reach the 50th percentile. In the aprlng. A 20th- 
percentlle alxth grader would have to achieve over four tlmee normal 
growth to reach the 50th percentile by aprlng. 

Moreover^ It la Important not conclude even for the eecond grade 
that doubling the amount of Inatructlon or doubling the e^f ectlvent^aa of 
the Inatructlon would be enough to ra^ae the atudent to the 30th percen- 
tile. i>iormal growth is certainly due in part to claaaroom Inatrucclon^ 
but it alao Indudea all the effecta of out-of-school learning and matura- 
tlon» and theae effects cannot be doubled ao eaally. It la alao affected 
by the motivation of the atudent. In other Mtds normal growth la a 
reault of* 

• .laaaroom lnetructlon:> 

• t jt-of-achool learuing, 

• maturation^ 

• motivation. 



KCEa are normalized atandar:^ acorea with a mean of 50 and a atan- 
dard deviation fo 21.06. Becauae thb acale la normalized it la aaeumed 
to be equal-interval — that la» the length of the interval between any two 
adjacent acorea Ir the acale la equal to the Interval between every other 
pair of acoroa (Talliaadge and Wood» 1976)* 



182 



210 

190 
180 
170 
160 
150 



l&O 



to 
o *a _ 

u vi ^ 

5h-3 120 

1x0 
100 
90 
80 

70 

60 



^ 90th 
Zlla 



90th 
Zlla 



50th + 

20th 
10th 



n 
I ) 



+ 50th 

20th 
10th 



^ — [- 



Fall Spring 
Grade 2 



90Cd 1- 
Zlla 



50th 



20fch 
lOtb 



T90th 
Zlla 



fl 



.XT 



-1 - - 50tb 



20th 



-Ljoth 



Addlclonel groffth 
t t required for 20th Xil« 
< I to reach 50th 2lla 

HaxliQum 
project loqvact 



Hormal 
I growth 



f 



I 



— I 1 

Fall Spring 

Grade 6 



Figure 1« Impacts of regular classroom learning and Title I 
projects on achievement test acores In English* 

183 




Thu8» even If we can double the amount or the effectiveness of classroom 
Instruction^ we should not expect to double the amount a student learns* 

Impact of Title I projects * if It Is true that the classroom Is only 
one of several factors contributing to student learning^ then even dramatic 
Improvements In school Instruction might produce rather modest gains* 
Existing data> though not conclusive^ tend to bear this out. Analysis of 
data from a great many exemplary Title I projects suggests that^ In terms 
of the scale In Figure It gains produced by projects are small, in fact* 

It Is difficult to find convincing evidence of gains of even one-third of 

2 

a standard deviation with respect to the national notm* This amount has 
been added to the bar In Figure I to represent the maximu m Impact that 
might be expected from an exemplary Title I project* Of course^ this Is 
not a rigorously established limit* but based on available program evalu- 
atlons^ It appears to be a realistic value. 

To complete the picture^ consider the growth scales In Figure I for 
sixth graders* Note that the spreid between the 10th and 90th percen- 
tiles Is greater for the older age group» but that normal growth for 2Cth* 
percentile students Is considerably less than at second grade* This riormal 
growth still Includes the effects of out-of-school learnlngt maturation^ 
and motivation^ so the maximum Title I Impact (again represented In the 
figure as one-third of a standard deviation) would require a project that 
was more than twice as affective as regular classroom Instruction alone. 

In shorti normal growth looks rather small when measured against the 
percentile scale » and the amount of growth that can be directly attributed 
to classroom Instruction Is even smaller* Thus» even a dramatically ef- 
fective Title I program^ one In which Instruction Is several times as ef- 
fective as that In the regular classrom^ may raise student scores by only 



This amount has been st^ggested as representing a "Juet noticeable 
difference'* when comparing two groups on physical attributes such as 
height or weight (A. 0. H* Roberts* 1977b). In the context of project 
evaluation* It has been used as j^n arbitrary criterion b*. '^v which gains 
were considered of little educational significance (Ta^ ^dge and Horst* 
1976). 




a few parcentlla points or KCEe per year. In most cases* evaluations have 
been designed to measure much larger gains than ve can reasonably expect 
to find. In such avaluatlons^ the relatively small program Impacts that 
actually occur may be completely obscured by the amounts of error normally 
associated with an evsluation. 

I mpact of Bilingual Education Projects 

In bilingual education programs It la more difficult to make gen-* 
eiallzatlons about thu size of achievement gains for a number of reasons^ 
In order to dlscu&s the sl2s of galna It is necessary to have a meaning-- 
ful scale of measijirement (scores that can be referred to a familiar range 
of scores)- (Jnf ortunately* such scales ol measurement are avallsble only 
for major achievement tests In English^ although work Is currently being 
dona to develop meaningful sceles for some langusge proficiency tests and 
Spanish achievement tests* 

It is also necessary to have meaningful standards of comparison In 
order to determine whether the amount of growth* as measured on the scale* 
Is greater or less than the amount of growth that would be made by similar 
students participating In (1) similar bilingual programs* or In (2) tradl* 
txwm«i>* all-English classrooms* Some tests have norms designed to provide 
s standard of ^'normal growth*' In bilingual programs* but such norms are 
not yet considered technlcslly adequate. Standards that provide an ade- 
quate no-treatment expectation for students of limited English proficiency 
simply sre not available. (See Section 6» Limits to the Usefulness of 
Korms)f Some project personnel hav3 asked why they cannot simply use the 
English norms for the Spanlch version of a test. This would be highly 
Inapproplrate since It cannot be assumed that students In a bilingual pro* 
gram grow at the same rata as students In traditional n^rmlng populations 
(either in English or In their native language). There Is no evidence 
that the equipercentlle assumption holds true for students of limited 
English proficiency. In other words* xt cannot be assumed that a group 
of limited-English stude.\t: who score at the 30th percentile* for example* 
at pretest time would score at the 30th percentile at posttest time. They 



185 




tiBy exceed ^normsl growth*' In some subjects at certain times and fall below 
It at other times depending on s complex of factors. 

In bilingual education programs the slae of the gains that cr.n be ex* 
pectedt the language In which g&lns can be expected^ had the grade level 
at which they can be expected depend on a number of factors. Ihese Include: 

1* the type of students o£ limited English proficiency being served* 

2. the program model being Implemented* and 

3* the general context In which the program operates. 

For exampl<%t consider the case of students who are relatively balanced 
blllnguals from the time they enter school^ but are also somewhat limited 
In English proficiency^ and are receiving English Instruction in all sub- 
ject areas as well as some native Isnguage Instruction. There are some 
reasons to expect that these atudents may make gains In English achieve- 
ment, similar to those made by students in Title I programs, if these stu^ 
dents receive Instruction primarily in the dominant language. Englls.i 
gains for that year may be less than the norms and larger gslns will be 
expected to appear after transfer to a greater amount of instruct ion in 
English. Now consider the case of students who are extremely limited in 
English proficiency at the time of pretesting. It seems reasonable to 
assume that achievement as measured by English tests may exceed formal 
growth" if a great deal of English Instruction is provided. If instruc- 
tion is provided primarily in the dominant language* Engll3h gains may 
be less than **normal growth** aiid will be c^xpected to appear at a higher 
grade level when transfer occurs. 

Gains that can normally be expected will be discussed in relation 
to the three subject areas most stressed in bilingual programs (a) oral 
language* (b) reading* and (c) math. Under each topic gains will be 
discussed for the native language and for English. 

Oral Lsnfluage 

Although oral language development in both the first and the second 
language is a major goal of bilingual programs* there is little daca 



166 



available to Indicate what kiud o£ growth can be expected in bilingual 
progrsmg* Many programs administer oral proficient teats for purposes 
of claaslf Icatlon* but unfortunately galnx aia often not reported as 
part of the Impact evaluation* major studies h^ve examined this 
Issue* probably due to the tlcie Involved In Individual test admlniatra^ 
tlon* 

Native Isnguaga * Itie amount of growth that can be expected In the 
native languaga dependa on students' Initial proficiency In the home 
language* the amount and quality of instruction* and language use In the 
home and community* Although norma are available for some tests » there 
is little data to indicate what "normal growth** la* In the bilingual 
PIP field testt only 7 of 19 sites reported both pre* and postteat data 
for oral proficiency in the native language* Cue to the variety of f^sts 
uaed and the different types of scores reported (language proficiency 
level vs* raw score)* It Is not possible to arrive at any generalizationa* 
Neverthele&St one important point should be made* Tm sites demonstrated 
a loss in Spanish language proficiency on the language Assessment Scales* 
While for Engliah language testa it is assumed that s certain amount of 
growth from fall to spring is inevitable* due to maturation snd other in* 
fluences* this is not neceasaiily true for non^English langusges in the 
U*S* Native language loss may be the norm in certain programs in certain 
communities* 

Enftlish * There is also a lack of informfi.tion concerning "normal 
growth'* in English oral language* logically* it might be expected that 
large gains would occur during the first and second years of a student's 
participation in a prt^gram since curricular emphasia is placed on aecond 
language acquisition* Data from the PIP field test indicate thst on one 
commonly used teati the Language Assessment Scales* students (n « 162) in 
the first grade gained an average of 13*7 raw score points per year (in 7 
sites) snd students (n « 102) in the second grade (in 3 sites) gained an 
average of 6*7 points (out of 100 total points possible)* 



187 

I S3 



Native languas ^. The amount of growth that can be expected in native 
language reading acorea depends on Initial native language proficiency of 
the etudents^ the Inatructlon provided^ and the degree of uac of the lan- 
guage la the home and community* tforms are available for aome Spanish 
language achievement teats (aee Section 6)* Although theae norma cannot 
provide a no-treatment expectation^ and are not completely adequate tech- 
nlcally^ they can provide a very rough eatlmate of reading growth expected* 

Engllah* Some data on Engllah reading gains are available from a 

3 

national study of Title VII bilingual programs* Although the atudy has 
aome aerloua methodological flawa» such as a comparlaon group that waa not 
aufflclently comparable to the treatment group^ there la aome Information 
In the report that can be useful to evaluatora. Studenta In a national 
aample of bilingual programs were pre- and posttested with the CT6S Read- 
ing teat over a period of about five months* Table 1 lllustratea average 
amounta of growtVt for atudenta* The pre- and poatteat acorea are expresaed 
In percentiles^ and the amount of pre to post growth Is expresaed In per- 
centllea and HCEs* The scores are reported for gradea 2 through 6 for four 
different groupa of atudents who were claaalfled Into language proficiency 
groupa by their teachera for purpoaea of test taking* For example^ an 
"Engllah->domlnant bilingual'^ waa defined aa a atudent whoae teacher felt 
that s/he ahould take a reading teat In both languagea and a math teat In 
English. Th** percentile acores represent the students' standing relative 
to a national ncrmlng sample* If there la no change In percentile standing 
trom pre to poatteat » then It la uaually aaaumed that "normal growth'* haa 
occurred for the studenta at that level. We do not know» however, hov 
almllar atudenta would have performed without a bilingual program* These 
norma do not provide a no-treatment expectation; they only aerve to compare 



American Inatltut?.a for Research* Evaluation of the Impact of ESEA 
Title VII Spanlah/Engllah Bilingual Education Program^ Volume I; Study 
Design and Interim Findings* February* 1977* 




students to the nstlonsl average* A gain of 7 or more KCEs Is considered 
large for purposes o£ this discussion* A. gain of 3 to 6 KCEs Is moderate 
and a gain of 2 or less Is minimal* 

Monolingual Spanish speakers scored from the 2nd to the 5th percen- 
tile on pretesta and from the 3rd to the 6th percentile on the posttesta* 
They showed galna ranging from 0 to 1 percentile points (0 to 3 KCEs)* 
Lack of aubstantlal growth In this area might be attributable to aeveral 
factors t among them (1) floor effects that limit tha extent to vhlch galnd 
can be detected* (2) currlcular emphasis on English oral language develop- 
mentt and (3) reading Inatructlon in the native language prior to Intro*^ 
ductlon of English reading* 

For the Spanlsh*^omliaant blllnftusl group* there ware very large 
gains of 13 percentile points (16 HCEa) In the second grade* This may be 
due to Increaaed English language proficiency aa well es improved reading 
ability* In many programs second grade students have had very little* If 
any* experience In English reading at pretest time* but by the end of the 
year they have transferred reading skills to Engllbh. Gains were moderate 
In the 5th grade* minimal In 4th and 6th* and there Is a moderate loss In 
the 3rd grade relative to the national norma* 

Engllsh'-domlnant blllnguals demonstrated moderate gains In the 2nd 
and 5th grades* and "normal growth" relative to nstlonal norms In the 3rd* 
4th* and 6th grades. The monolingual English group showed a gain of 12 
percentile points (9 NCEs) at second grade* moderate loases at the 3rd 
and 5th grades* and minimal changes ii! percentile standing In th^ Uth and 
6th grades* it should be noted that thes^e second graders scored at the 
15th percentile level at pretest time* while the English dominant bllln- 
guals started out substantially higher at the 2jth percentile* 

Csutlon must be exercised In Interpreting ^'losses" relative to na** 
tlonal norms* A drop in percentile standing (for example from t'r« 18th 
to the 16th percentile) could represent positive piogram Impact if similar 
students would have dropped even ft.rther wltliout a bilingual program* 



189 



Mathematlca 



Table 1 alao dlaplaya percentile acorea end galna made by the average 
Title VIX atudent In mathematlca computation. Monolingual Engllah atu- 
denta and Bngllah-domlnant blllnguala took the teat lu Engllah, while the 
other two groupa took the teat In Spanlah. The pattern la quite different 
from that of the reading acorea. Of the 20 groupa reported acroaa grade 
levela and language groupa, 17 made galna relative to the national norma. 

The monolingual Spanlah group In the 2nd and ^ made galna 

of 16 and 13 percentile points reapectlvely (10 and .a) relative to 

national norms, while moderate galna were demonatrated In the 4th, Sth, 
and 6th gradea. The Spaglah-domlnant bilingual group ahowed very large 
gains at aecond grade (30 percentile polnta, 17 NCEa) , moderate galna at 
3rd and 4th grade, minimal change at Sth grade, and a moderate loaa at 6th 
grade. 

The Engllah"dotolnant bilingual group started with the highest 2nd* 
grade percentile atandlng of the four language dominance groupa at preteat 
time: 37th percentile compared to 28th for the monolingual Engllah group 
and 17th and 18th for the other two groupa. The Engllah-domlnant bllln- 
guala exhibited moderate galna at the 2nd, 4th, and 6th gradea, a minimal 
gain at 3rd grade and '^norm^d growth*' at the Sth grade. The monolingual 
Engllah group exhibited large galna at the 3rd grade level (11 percentile 
polnta, 7 HCEa) , moderate galna at 2nd and 4th, a minimal gain at Sth, and 
a moderate loas ^t: 6th* 

In aiimmary, these data Indicate that growth patterna for atudenta In 
bilingual programa differ from growth patterna of atudenta In a national 
notmlng sample. For example, very large galna In national percentile 
atandlng were demonatrated In English reading and math for Spanlah-domlnant 
bxllngual atudents In the aecond grade. The Information presented here 
may provide soms aaalatance to dlatrlcta In aettlng goala. A word of cau^ 
tlon la In order, hr^^ever. Theae data represent bilingual programa of a 
variety of types and It la not poaalble to determine which type of program 



190 



produced which gains* There may be Interactions among program type, lan* 
guage group, and grade level that affect these data* In addition, there 
my also be large amounts of error due to floor effects for some groups* 
Gains to he made In Individual projects will depend on a number of factors 
Including characteristics of students and the type and quality of Instruc- 
tion provided f 



191 



Table 1 

Natlottfil Percentiles (NF) for CXBS Beading Total snd Mathematics Computation Meai::'? 
by Judged Language Dominance Group — Title VII Hispanic Students 



Outcome Variable 


Grade 


Honollnsual Ensllsh 


EnKliah-Domlnant Blllr 


l^ual 


Pretest 


NP Postteat NP 


Change 
Zlle NCEs 


Pretest NP Poattest NP 


Change 
Zlle NCEs 


CTBS Reading 


2 


15 


27 


12 


9 


25 


29 


4 


3 


Total Score 


3 


25 


21 


-4 


-3 


25 


24 


-1 


-1 


^jsngxian J 




30 


28 


-2 


-1 


22 


24 


2 


1 




5 


22 


18 


-4 


-3 


19 


27 


8 


5 




D 


20 


21 


1 


1 


18 


18 


0 


0 






28 


38 


10 


6 


37 


46 


9 


5 


ij omp u cac X OQ 




29 


40 


11 


7 


32 


36 


4 


2 


Score 


4 


37 


48 


11 


6 


42 


48 


6 


3 


(English) 


5 


32 


33 


1 


1 


33 


33 


0 


0 




D 


40 


34 


-6 


-4 


34 


38 


4 


3 






Spanish-Dominant BlllnKual 


2 

HoaollnKual Spanish 


CTBS Reading 


2 


5 


18 


13 


16 


5 


6 


1 


2 


Total Score 


3 


17 


13 


-4 


-4 


4 


4 


0 


0 


(English) 


4 


14 


15 


1 


1 


3 


3 


0 


0 




5 


18 


25 


7 


5 


3 


3 


0 


0 




6 


33 


36 


3 


1 


2 


3 


1 


3 


CTBS Mathematics 


2 


19 


49 


30 


17 


18 


34 


16 


10 


Computation 


3 


28 


39 


11 


6 


20 


33 


13 


9 


Score 


4 


49 


58 


9 


5 


31 


41 


10 


5 


(Spanish) 


5 


30 


31 


1 


1 


19 


26 


7 


4 




6 


45 


38 


-7 


-3 


23 


31 


8 


6 



These data are taken from; American loatitutes for Research^ Evaluation of the Impact of ESEA 
Title VII Spanish/English Bilingual Education Program^ Volume I; Study Dealgn and Interim Findings^ 
February^ 1977* 

Sfonollngual Spanish students at all grade levels took the CTBS/S^ Level C^ aa both a pretest and 
posttestf CTBS norms used here were those for the form and level taken by all other Judged language 
dominance groupa at a particular grade level* 



APPENDIX & 
GUIDELINES FOR OTHER EVALUATION AKEAS 



Contents 

B-K Evaluation o£ Affective Impacts 
B*2f Evaluation of Staff Development 
B-3t Evaluation of Parent/Community Involvement 



Introduction 



These aectlons address the evaluation of atudent affective growth^ 
ataff development I and parent/community Involvement. Virtually all bilin- 
gual education programs have gosls In these areas and spend considerable 
time and fundft In activities designed to addresa them* There la a lack of 
Information^ however* on methoda and Issues In evaluating theae areaa» and 
program evaluations* often due to time constraints* often give them low 
priority. 

In an attempt to Improve local evaluatlona snd to broaden their 
scope* KMC provided technical aaslatance to the bilingual PIP fleld-teat 
sites. The evaluatora were encouraged to (a) employ measures other than 
schlevement tests* and (b) to evaluate goals other than atudent achievement 
goala. 

The following procedure waa uaed In developing these sections! (a) 
evaluation reports from the PIP sites were reviewed and analyzed; (b) 
current relevant literature waa revised; (c) recommendations and auggea** 
tlona were developed; and (d) materials were sent to all 19 PIP fleld-teat 
altea In the hope of Improving the evaluatlona. 

The format of Appendix B differs from that of the prevlua eight sec- 
tions since each Section (1*2*3) consists? of the document that waa sent 
to each site participating In the billngual^PIP field teot. Each section 
contains the following: (a) a review of practices employed by bilingual 
programs to evaluate the component* Including the moat common practlcea 
and other practlcea used by at leaat one alte; (b) a dlscuaalon of tech- 
nical Issues; and (c) recommendations for Improving the evaluation of thla 
component. 

In addition* the aectlon on evaluation of affective Impacts Includes 
recommendations for the use of two unobtrualve meaaurea of project linpacti 
attendance and retentions. 



195 




B-1 



EVALUATION OF AFFECTIVE IMPACTS 



Common Practlcea 

Moat bilingual projecta have ^plicit goals for studant affGctiva 
growth* The most common goals of the projecte in the Bilingual pip field 
test study were tot 

• increase awareness of and appreciation for the child's oim 
culture and the domioant cultural and to 

s improve self*concept* 

A number of sites that had stated affective goals employed no aesauras 
and reported no results in this sres* Of those sites that did address stu* 
dent af fect» the moat common approaches wars ths follovlngt 

s paper*and^pencil, 8elf*report measures of self-concept, sdmlnis^ 
tered pre and post; 

s paper*and^pencil» self-report measures of cultural sttitudes^ 
administered pre and post; 

s documentation of classroom and outside cultural activities 
offered by project; 

s reporting the percentage of students who paticipated in a 
given number of cultural events in the classroom* 

Other approaches used by at least one site veret 

s teacher rating scale to assess students' social bshavlor; 

s teacher rating scale to asse^is students' school^rslated 
behavior and attitudes; 

s teacher rating scale to assess student attitude toward self 
as a bilingual and toward others as bilingusls; 

f teacher rating scale to assess students' psrticipstion in 
classroom and playground; 

a paper-and*pencil » self*report measures of sttitude toward school 
and toward school subjects, administered 3*4 times during year* 



Procedures Recommended to Tryout^Sites 

Immediate effects on students* In sttempting to dsscribe ths effects 
of a bilingual project on the LES atudents, it is essential to examine them 
from a broader perspective than simply noting chsnges thst occur over one 



ERIC 



197 

1B2 



year« Evaluators should describe loth lnunedlate and cumulative effects* 
The very nature of a bilingual projject makes It different from other 
types of special projects In one Important way« In most special proj-^ 
ects» It la assumed that the normal treatment Is tteanlngful to the stu-* 
dents» at least In the sense that that they can comprehend the language 
used In Instruction^ but that the special project conalsts of a better 
method of teaching* The situation Is different In a bilingual program* 
The normal* all-English program cannot be "meaningful" (in the sense of 
the Lau decision) If children are not yet fluent speakers of English* 
Instruction Is meaningful to children only to the extent that they can 
underatand what la said to them and participate In verbal exchanges vlth 
teachers and other atudents throughout the day* 

For this reason* the first question that needs to be addressed by 
districts In evaluating effects of the project on students la; To what 
extent are atudents receiving a meaningful education ? This question can 
be broken down Into other questions such as; To what extent can teachers 
and children communicate with one another? What proportion of the day Is 
meaningful to children In terms of the degree to which they speak and 
comprehend the language of Instruction? To waat extent are children able 
to relate to and profit from the Instructional materials? These are com- 
plex questions to answer due to the range of language proficiency levels 
of children and the Inadequacy of measurement techniques; nevertheless 
these Immediate benefits to children ahould be addressed* since* although 
they are obvious to bilingual educators* they are not always obvious to 
others* and since* although the long-grange effects of such Instruction 
should show up In test scores* this Is not always the caae due to short* 
range evaluation designs and poor teats administered under questionable 
conditions* 

Specification of goals * Measuring benefits to students In the af^ 
fectlve domain Is a tricky business for a number of reasons* The goals 
for the affective domain are often broad and vague* For example* the 
goal of Improving self^concept Is open to many Interpretations* It Is a 
controveralal goal aa well since It is not clear that LBS students neces- 
sarily have low self-^concepts* nor Is It clear what the causal relation^ 
ship Is between self-concept and achievement* It might be made more 
specific and more manageable by breaking It down Into various components* 
A project might set a goal that students In the bilingual project will 
Improve their opinion of themselves as successful readers ^ for example* 
or as successful math students * 

Causes of affective changes * Secondly* It Is not clearly stated In 
most proposals and evaluation reports precisely why project features are 
expected to bring about changes In student attitudes* In some projects 
It Is expected that self-concept will Improve through an underatandlng 
of the Cultural heritage associated with boti languages (aee* for example* 
Venceremos Project Management Directory* p« 86)* For others It Is Implied 
that Improved attitudes toward aelf and others are expected as a result of 
(1) accepting and using the language of the child; (2) providing successful 
learning experiences; (3) Integrating the Culture of the child Into the 
curriculum; (4) Involving parents In classroom and other activities; and 



198 

133 



(5) employing bilingual^ bicultural teacherB y/ho serve as role modela« For 
Btill other aitea it ia Implied that the project as a vhole will bring 
about affective changes in students* 

Many projects measure one chosen aspect of student attitudes and re- 
port the results without providing a dlacusalon of the posslhls rsasons 
for the results* If Improvementa are expected to he due to one of the 
project featurea mentioned ahove^ then a crucial atep must he to state 
whether that particular feature was Implemented. For example* If project 
personnel expect the cultural conponent to influence atudent self-concept* 
then It would he useful to describe the nature and extent of the cultural 
component that was actually implemented* If there waa no cultural compo^ 
nent Implemented » or if it was very minimal » then there Is no reason to 
expect that it (or a lack of It) affected self**concept« Ukewlae* if 
improved tself-concept Is expected to hs a result of the Introduction of 
concepts in the native languagst and the latter did not occur » there is 
no reason to expect to achieve the affective objective* Evaluatore 
ahould state» to the extent poaslble* which project features* or comblna* 
tlon of features t are expected to produce affective changes* They should 
then discuaa to what degree those featurea were tmplenented* I£ they were 
not Implemented » or were improperly implemented then it la not possible to 
attribute cbangea in student affective characterlatlca to thoae features* 
It ia auggeated that evaluatora focus on the proceases that are expected to 
bring about changea to aee that these proceaaes are in fact occurring* 

Measurepent * Bilingual PIP field*teat sitea that used affective mea** 
aurea made an attempt to locate the beat measures svailable^ but the choice 
of adequate meaaures (particularly in two languagea) Is very limited* Most 
sites used paper*and^t.ncilt aelf-^report Instruments or teacher rating 
acalea* Self^report inatrumenta are very unreliable for young children 
since social dealrabllity and events of the moment have s great influence 
on responses* Teacher rating acales are more likely to be rellsblst par^ 
ticularly if several measures are taken longitudinally* A variety of 
unobtruaive measurea can also be used* Although there are alwaya serious 
questions of validity and reliability aaaoclated with any affective mea* 
aure» aitea should chooae the best measures possible* 

One site sdninistered an affective test and did not report results 
claiming that the teat was not valid and reliable* Another site reported 
results of a locally developed meaaure^ but diacounted the results for 
similar reasons* If the reliability and validity of a locally developed 
teat are unknown* theae parameters ahould be investigated* If this Ib 
not poaaible^ then it might be better to ,:hoose a commercially available 
instrument with established psychometric properties* 

Evaluating affective changea Is ptobXematlc since it is impossible 
to meaaure attitudes directly* Since an attitude la a hypothetical con* 
atruct generally conaidered to be composed of feelingSt behsviors» and 
knowledge or beliefs » it Is neceasary to chooae poaaible Indicators of 
an attitude^ meaaure theae» and make Inferencea about tha attitude. Some 
suggestiona concerning the kinds of attltudea that can tft measured and 



199 

134 



the possible mftnlfeststions of these sttitudes are presented in the out* 
line entitled "Approsches to Evsluating Affective Iiapscts*" The outline 
includes spprosches to (1) evslustion of the Immediate effects on students^ 
(2) evaluation of the instructional strategies intended to bring about 
sttitudinal changest and finally (3) evsluation of the attitudinal changes* 
Each item preceded by a bullit (o) is simply sn example and there may be 
many others* The purpose of this report and the outline is to assist 
sites in exploring the variety of ways in vhich s district can describe 
project benefits to students* The number of approaches used and the ex^ 
tent of their use will depend* of course* on time and financial constraints* 
It ia hoped that the suggestions provided here will assist districts in 
making better Informed choices bssed on a number of options* 



200 

IBS 



ApproacheB to Evaluating Affective IippactB 



A» Evaluation of Immediate Project Becefltfl to Studenta 

I- Instructional features contributing to a ^^meanlngfur' education 
(In the sense of J<au) 

a presence of teachers and teacher aides who speak language of 
child 

a acceptance of and use of langutga of child for Instructional 

and other purposes 
a use of Instructional materials written In language of child 



2* Immediate potential effects on students 

a ability to communicate with teachers and other students 
a ability to profit from Instruction and participate more 

fully In other activities 
a ability to relate to and profit from instructional natarlals 



3. Measurement techniques 

a language-use observation Instrument 
a teacher self*report of language use 

a teacher rating scale or questionnaire to evaluate materials 



B# Evaluation of Processes Leading to Affective Changee In Students 
It Processes expected to lead to affective changee In atudents 

a Integrating child's culture Into the curriculum 

a providing successful learning experlencea 

a accepting and using the language of the child 

a establishing good relations between home and achool 

a providing role models 

a teaching the minority language to the majority group 



Measurement techniques 

e classroom observation 

a Interviews with appropriate staff 

a rating scales 

a self-report In upper grades 



201 

196 



Evslttatlon of Student Attitudes 



1* Attitudes toward self 

a* kinds of attitudes toward self 

• successful reader 

• in control (locus of control) 

• bilingual 

• successful math student 

• motivated 

• active participant in classroom 

• ethnic group member 

• i^reatlve and able to contribute 

bf manifestations of attitudes toward self 

• student comments 

• student non-verbal behavior 

• student language use 

Cf measurement techniques 

• teacher rating scale 

• self^r^ort, paper*and*pencll test 

• student interview 

2* Attitudes toward others 

a* components of sttltudea toward othsrs 

s respect for other races or ethnic groups 

• respect for other cultures 

• respect for other languages 

b* manifestations of attitudes toward others 

• language use at school 

s interethnlc play at school 

c* measurement techniques 
s soclometrlcs 

• lsnguage**use observation Instrument 

s interethnlc Interaction observation Instrument 

• rating scale 

3* Attitudes toward school 

a* kinds of attitudes toward school 

• sense of belonging 

• particular achool subjects (a*g*t attitude toward reading) 

• school subjects In a particular language (a*g*» Spanish 
reading) 

ft academic and social activities 



Z02 

197 



h* nanlfestatlond of attltudee tomrd achool 

• attendance 

• active participation In actlvltlea 

• atudent coomenta 

• retentlona 

• vllllngneaa to ahare achool experl«ncea with fimlly 

c* measureaent technlquea 

• attendance records 

• teacher rating acale 

• obaervatlon Inatrumeiit 

• 8e\f-report paper-and"^ encll test 

• retention records 

• parent Interview or questionnaire 

Attltudea toward home 

a* components of attitudes toward home 

• borne languas« 

• borne culture 

• alienation between home and achool 

b* manlfeatatlona of attitudes toward home 

• wllllngneas to speak home language riven after EngUah Is 
mastered 

• wllllngnesa to ahare Items and atorles from home 

c« neaaurement technlquea 

• rating scale 

• classroom observation 

• aelf report 



203 

198 



UnobtruslVfl Meaaurea of Prolact Impact; 



Attendance and Retentions 



Project evaluationa generally rely heavily on teats and question- 
naires. Since these techniquea require the cooperation of a reapondent* 
A great deal of ti^ is often Involvedi and the meaaure itself can con- 
taminate the reaponsa* Teachers already complain of overtesting* snd it 
is difficult to obtsin valid snd reliable test results for children in 
the early grades* 

For these ressonsi we eacourage sites to broailen their range of eval*^ 
uation methodologies and to conalder exploiting a variety of measurement 
possibilities* No messureaent technique is without biss* but combining 
messurement techniques with different kinds of bisses csn give s more 
complete picture of wbst has occurred during the life of s project* 

Unobtrusive messures do not rsquire any sort of response and so do 
not interfere in any way with the students' school day* Two such meaaurea 
worth conaidering for use in your district are recorda of attendance and 
retentiona* If there ia any indication that attendance has improved or 
retentions have been reduced as a reault of the project over the laat two 
years (or longer* if s bilingusl project wss slresdy in operstion)* then 
these dsts would be worth examining* It is psrticularly important to 
examine these issues if they were sddressed in s needs asaaasment or are 
project gosls* 

The chert illustrstss compsrisons which might bs employed* Reten** 
tiona in the project achool for the current year (A) can be compared to 
retentiona before the project waa inatalled (B)* The current project 
school (A) can be compared to a similar achool that haa no well eatab- 
liahad bilingp^l project (€)* If neither of theae compariaona ia poaai- 
ble* then diatrict* regional or national historical data can be uaed (0). 



205 

199 



Ho matter which comparlBon la chosen (A to A to or A to D)| 
be sure to compare the project group to a comparison group with similar 
characteristics* For example* one might make any or all of tha following 
comparisons: 

1* LES to LES 

2* Spanish Surnamed to Spanish Surnamed 

3* FES to FES 

4* Total School to Total School 

It Is most desirable to compare LES project students to LES comparison 
students* If* however* the language proficiency characteristics of com- 
parison students are not known* or If the criteria for dealgnatlng com- 
parison students as LES were substantially different from criteria being 
applied for project students* then It would be better to compare all 
Spanish's urnamed project students to all Spanlsh-surnsmed comparison 
students* Gathering data on FES students or on the total school popula* 
tlon serves the purpose of establlahlng the comparability of the project 
school(s) to the ccnaparlson school(s)* 

Interpretation of results must be made In light of the comparison 
used and must take Into account the limitations of the available data* 
While there may be other influences which affected retention patterns* 
such as major policy changes* If it Is likely that results are due at 
least In part to the project* they should be reported* 

We have Included two worksheets* one for gathering attendance data 
and one for retention data* They may be of assistance to you should you 
decide that this Information would be appropriate for your evaluation* 
We would appreciate your comments on these worksheets* If you are already 
using a similar procedure or have relevant Information we might share with 
other sltesi please let us know* 




Possible Comparisons for Attendance and Retention Data 



A. Current project' school, 1978-79 B. Same school before project, 197 




Key: 

I ) ) Ff- S 

\\\ Kf-S/LES 

/// Sp<inlsh-surnao£d 

207 



Retentlona Worksheet 



Steps 

1* Determine what data Is available* 
2* Chooae a comparlaon* 

3* Calculate retentlona of appropriate groupa Hated below for project 
atudenta. 

4* Calculate retentlona of the aene groupa for comparlaon students* 

LES Retalneea (What percent of L&& project students were retained?) 

a. Number of LES project atudents retained In 1978-79; 

K lat 2nd 3rd 4th 

b* Total number of LES atudents In project In 1978*79; 

K lat 2nd 3rd 4th 



c* Percent of LES project atudenta retained (&/b ■ X); 

K lat 2nd 3rd 4th 

Spanlah-surnamed Retalneea (What percent of Spanlah^aurnamed project 
atudenta wr^re retained?) 

a* Number of Spanlsh-dumained project atudenta retained in 1978*79; 
K lat 2nd 3rd 4th 

b* Total number of Spanlsh-surnamed atudenta In project In 1978-79; 
K lat 2nd 3rd 4th 

c. Percent of Spanlah**aumamed project atudenta retained (a/b ■ X)t 
K let 2nd 3rd 4th 



FES Retalneea (What percent of FES project atudenta were retained?) 

a* Number of FES project atudenta retained In 1978-79; 

K lat 2nd 3rd 4th 

b* Total number of FES atudenta In project In 1978-79; 

K lat 2nd 3rd 4th 

c. Percent of FES project atudenta retained In 1978-79 (a/b ■ X)t 

K 1st 2nd 3rd 4th 

Total School Retalneea (What percent of atudenta In the project achool(a) 
were retained?) 

a. Number of retalneea In project achool(a) In 1978-79; 

K lat 2nd 3rd 4th 

b* Total number of atudenta In project achool(a) In 1978-79; 

K lat 2nd 3rd 4th 




Attendance 



Project flchooi 

Sourcet Each teacher's attendance records* 

Method t Calcelate mean percent of abaencea for atudents In 
project school « 

Example; 1« For each student* divide the number of days absent by th« 
number of days enrolled* (Examplet 11/176 ■ 16X) 



2« Calculate the mean percent of absences for the entire 
group « 



Compariaon school 

Sourcet Each teacher's attendance records* 

Method! Same as for project atudenta« 

Alternative methoda of calculating attendance 
Methods! 



• Calculate mean attendance for only those students who 
were enrolled for at least 65X of the school year (Which 

^ia an estimate of a student's enrollment period between 
pre^ and poatteating* 

• Calculate mean attendance for thoae atudents who partici- 
pated in pre** and posttesting* 

• A less desirable approach but perhaps more realistic in 
order to have aa accurate comparison ia to calculate mean 
attendance for all project students enrolled during the 
year* Thla calculation would include atudenta who may 
have been enrolled for any period of time and moved away* 
Thia approach can be used when the comparison group does 
not have data which exclude the mobility factor* Th« 
compariaon can be either a non^-project school or histori- 
cal data available from a particular scho.^1 or* as a laat 
resortt district historical data (see cbv c)* 



209 




Possible Interpretations of Poaltlve Results 



Pareatst Parents may see th^ project as more relevant and beneflclali 
therefore they may be more persuasive in seeing that their 
children attend school* 

Student: Students may find their school experience more relevant and 
less traumatlci and therefore they may be more Inclined to 
attend school* 

School: The school may be affecting the behavior of students aad 
parents by providing better school*home relations* thereby 
positively affecting students' attendance rate* 



Attendance Reporting Porm 



Grade 


1 

NBS/LBS 


2 

Spanish Surname 


3 

FES 


Proiect Comparison 


Froiect CoaParlson 


Pro.lect ComparlBon 


K 








1 








2 








3 









^ITse the sections appropriate for your site depending on what information 
may be available and what questions your project wants to answer* For 
example^ If your district does not have attendance records broken down 
by Language groups you may wish to use column 2 only* 



2n.i 

210 



8-2 

EVALUATION OF STAFF DGmOPHEKT 

Common Practlcea 

A review of evaluation reports from altes participating In the field 
teat of Bilingual Project Information Packages revealed that the moat coitt- 
dion approach to the evaluation of the staff development component was tot 

A Provide description and/or documentation of workshops and other 
training activities that were provided » and to 

e Evaluate the content of the training activities* 

The description of workshops and other activities usually consisted of a 
list and some sample outlines of presentations* In order to evaluste the 
content of the sessions^ most Bites bad workshop participants fill out a 
combination rating f oru/queatlonnalre In which they evaluated sessions In 
terms of criteria such as expertise of presentor* relevancy* clarity* prsc 
tlcallty* meeting stated objectives* and meeting needs* The results of 
these evaluations were summarized across participants and often actual 
comments made by participants were Included in the summary* Several such 
summary sheets* representing several workshops* were generally Included 
In an appendix* The results were then summarized across several or all 
sessions for the y^^r and the conclusion reached was often something like 
"With one exception* all workshops met their objectives and provided use* 
ful practical Information for teachers**' The majority of sites evslusted 
their staff development component at this level* 

A number of sites employed additional techniques Including the fol- 
lowing ; 

e A needs assessment administered In the fall* 

* Pre- and posttests on content of workshop sdmlnlstered to par- 
ticipants at each workshop* 



211 205 



Claesroom observation to determine areas In which training Is 
needed • 



* A questionnaire administered to a ticnpie of district (non- 
project) staff to determine the extent to which they received 
Information concerning the project. 

* A prep-post (fall^sprlng) test for project staff measuring 
knowledge of cultures represented In class. 

* A questionnaire to assess kpowledge (self report) of project's 
goals and objectives. 

* Reporting of university credits* special certlf Icates* or 
degrees received during the project life. 



These approaches from a sample of programs represent an attempt to conduct 
a broader evaluation of the staff development component. Since staff de» 
velopment Is the main approach to Implementing many bilingual programs* It 
Is Important to select evaluation strategies that will provide aa thorough 
and accurate an assessment as possible of the effects of training. 

Recomanendatluns 

The staff development component can be evaluated through a variety 
of approaches depending on who or what Is evaluated* In terms of what 
specific qualities or characteristics* and over how much time. What Is 
evaluated has generally been limited to the content of the pre- or In- 
service sessions. But to adequately assess the value of a staff develop* 
ment program* the effects of training sessions on program staff must also 
be examined. One general goal Is to Improve the teachers' and aides' per** 
formance In bilingual Instruction* and the results can be determined by 
answering the foJlowing questions: How has classroom performance changec!^* 
How have knowledge* skills* and attitudes changed?* How have language 
skills Improved? An adequate evaluation should try to answer these ques- 
tions. Another goal that has been receiving Increased emphasis Is the 



212 




upgrading of manftgement and evaluation skills for program staff vho per-* 
form tt)es«% functions* It Is Just as Important for project directors to 
be trained In coomunlcatlon skilly for example^ as it Is for tMcbers to 
be trained In Instructional techniques* 

The ultimate benefits of a staff development component should be Its 
effects on the quality of the students' education* It la more difficult to 
measure fffects on aturiftntfl and to be able to attribute them to tr' ;nlng 
seselonSt but* In some cases* districts may be able to do this* If* for 
example^ teachers attend a session on '^Cooperation in Learning Centers*** an 
observer should be able to document the extent to vhlch there is a change 
in this kind of student behavior over time using a simple observation 
inetrument* 

In addition to ass^.aslng effects of training on teachers^ aldes» and 
students^ it is possible to evaluate products resulting from training 
activities* Xf part of the in-service program Involves materials devel^ 
opmentt then the resulting materials can be listed* described* and eval* 
uated in terms of their relevance* usefulness* and other features* 
Sites may choose to evaluate the management of the staff development 
component in order to provide useful information to improve n^t year's 
training program* 

The term ^'staff" can be defined as project staff i or more broadly 
as all district staff* or even more broadly as staff from other dis-* 
trlcts* If non^-prolect staff are Included in in-service sessions* or 
if they receive information about the project* then the effects of these 
efforts can be evaluated and discussed* If the practices employed by the 
project are so innovative or successful that they are influencing neigh-* 
boring districts* then this is an Important benefit to others resulting 
from the project* 

A number of suggestions are expressed in outline form in the attached 
framework entitled ^'Approaches to Evaluation of Staff Development Compo- 
nent*" The purpose of the framework is to help explore the variety of 



213 

2U7 



way? in wiilch a district can describe the benefits resulting from staff 
training. 

There are five major sections correapondlng to the topics that are 
underlined above. Within each of these topics suggestions are offered In 
three areas; (1) the time frame for evaluation* (2) the character 1st lea 
assessed* and (3) assessment methods. Each Item that is preceded by a 
bulllt (o) is simply a suggestion* and suggestions are not Intended to be 
all inclusive* 

The time frame for evsluatlng staff training can be viewed In several 
vaya* Each event can be evaluated* For example* the content of a work- 
shop* or teacher performance in the classroom can be svaluated after each 
workshop* Other approaches are to look at changes that occur from fall 
to spring* from fall to fall* or cumulatively over several years* Some 
specific suggestions are offered for characteristics to be assessed In the 
evaluation* These will depend on who or what Is being assessed an<i the 
nature of the training that was offered* In addition* aome assessment 
techniques are suggested* Measurement Is problematic for this program 
component since It Is difficult to obtain valid and reliable measures of 
changes resulting from training* If It proves unfeasible to employ an 
assessment Instrument of some sort* then simple description should be used* 
The number of approaches used for evaluating staff development and the 
extent of their use will depend* of course* on time and financial con** 
atralnts* but at least program atsff can make Informed choices based on 
a number of options* 



214 

2VS 



Approachea to Evaluation of Staff 
Development Cosponent 



Evaluation of content (vorkahopat pr«aen tat Ions « courseat cou'* 
fereiices) 

I* Time frame 

• for each event 
e over one year 

• over project life 

2* Character la t lea asaeaaed 

• language cf presentations 

t quantity (number of hours per year^ etc*) 
s meeting needs of Individuals 

• practicality 

s new Information 

s expertise of preaenter ^ 

s meeting atated objectives 

e relevancy to program needs and reaourcea 

s clarity 

s exchange of Ideas 

s continuity 

s variety 

s degree of partlcl]}ant Involvement 

3* Asseaament me thoda/ description technlquea- 

s rating acale 

s Interviews with staff 

s queatlonnelre for recipients 

s simple description of trslnlng 



Evaltiatlon of effects on Inatructtonal ataff 

1* Time frame 

t each event (ex: one*ahot poat workshop assessment) 

s over one yasr 

s over project life 

2* Chsrscterlatlca saaeased (depends on nsturs of trslnlng) 

s clsaaroom psrfocmance 

s degreea^ certification* endorsement 

s knowledge and akllls 

s sttltudes 

s commitment 

s language akllla 

s Involvement with parents and community 

s rolea of tescherst aldea» volunteers 

s self*-concept of Inatructora 



215 

209 



• management skills 

• evaluation akllla 
p comsimlty 

3t Assessment methods/description techniques 

• classroom observation 

• videotape 

• test 

• rating scale 

• Interview 

• questionnaire 

• tally 

• description 

• pre*post needs assessment 



C* Evaluation of effects on students 

1» Time frame 

each event (ex: one shot post-workshop assessment or 
pre-post wa.kshop assessment) 

• over year 

• over project life 

2* Characteristics sssessed (depends on content of training) 

• self direction of students 

• time on task 

• language use 

• Interethnlc Interaction 

• cooperation In lesrnlng centers 

• motivation 

• student work production 

3* Assessment methods/description techniques 

• classroom observation 

• teacher questionnaire 

• teacher Interview 

• tests 

• student lutervlew 

• parental report 

!)• Evaluation of .products resulting from training sessions (materials^ 
record^keeplng syst^^ etc*) 

i* Time frame 

• each event 

• over one year 

• over project life 



210 



2. Character iatica aeaeaaed 

• quantity 

• quality 

• usefulnesa 

• relevance to curriculum 

3v Asaaaament aethoda/deacription t«cbniqu6i 

• list and description 

• rating acale 

• documentation of diaaettinmtion 

• docunentation of extent of use 



Evaluation of manaRement of staff development fiomPOttAnt 

It Time frame 

• each event 

• over year 

a over project life 

2f Characteriatica asaeaaed 

a project director's role 

a instructional coordinator's role 

• adequacy of planning and implementation 

• coordination with ataff 

• cost effectiveness 

a inclusion of non*proJect personnel in project activities 

3. Asaeaament methoda/deacription techniques 

a participants questionnaire 

a rating scale 

a individual interviewa 



Evaluation of effecta on non^ProJect staff (including other die* 
tricta) 

1. Time tvme 

a each event (preaentationt mailing* etc.) 

a over year 

a over project life 

2t Characteriatica aaaeaaed 

a knowledge of or avareneaa of project goala and methods 
a degree of coordination between project and non*proJect daaa* 
rooma 

a attitudea toward bilingual education 
a intereat in participating in project 



217 




Asaesement methoda/ description techniquea 

• queetionneire 

• list end description 

• record of number of visitors to project 

• record of number of requests for Infomation about project from 
neighboring districts 

• documentstioA of extent of disserolnatioft effort 



218 

212 



EVALUATION OP FARENT/COHMUNin INVOLVEMENT 



B-3 



Coiomon Practices 



Most Frojftct Information Package (PIP) tryout sites documented and 
reported events sponsored for or by parents and coamunlty aembets* Whether 
or not changes cama about because of parant/ community participation vas 
often not addressed* Llttla or no attention vas given to examining the 
effects of this component on the school, the students or the community 
Itself* The following evaluation approaches vera by far the most common: 

e reporting attendance at parent advisory committee (PAC) meetings » 
and presenting minutes and a list of accomplishments; 

e describing parent vorkshopst parent education sesslonst and 
reporting attendancat 

e documenting efforts to disseminate Information about the school 
and the project to parents and community; 

e documenting home visits by staff and pa rent /teacher conferences* 

A limited number of sites employed additional evaluation technlqi^es, 
Including the following; 

e usa of a pte-post questionnaire to measure patents' gains in 
knowledge of bilingual education, and attitude toward the 
program; 

e documentation of parent sctJvLtles In the school (as tutors* 
field trip supervisors^ etc*); 

e list of products of parent /community workahope (Instructional 
games* cassette recordings* newsletter, etc*); 

e parent questionnaire to assess value of their pertlclpstlon 
In school activities; 

e parent questionnaire to assess whether or not Information was 
received about project and about project evaluation; 

e questionnaire addressed to pAC to assess strengths and weaknesses 
of the bilingual education project; 

e survey to assess child's home language use. 



219 

21 



Rcconmftrtda t Ions 



To a large extant the success or failure of a program is determined 
by the contextual features which characterise It. Parent /community (P/C) 
support of a bilingual education program can be a great asset In helping 
the prograa gain advocates and support* For this reason the first recom- 
mendation Is to document and report the typa of community support a pro- 
gram received throughout the various stages of program development (plan- 
ning stage» Implementation)* The amount of support a program receives 
Initially can he a predictor of the type of support It will receive 
throughout Its llfei unless some community feature changea dramatically* 
Once the schoole where the program will be housed ere selected » It Is 
recommended that aoma historical data be collected as to the extent of P/C 
support that existed prior to the program's Inception. This information 
can be used as a comparison In documenting the change In coimnunlty support 
over time. 

A second recommendatloa Is that realistic « meaningful short-term and 
long-'term objectives be written which will define the expected achool- 
community relationship. P/C participation In this activity Is eseentlal 
alnce It will outline their commitment to the school as well as their 
expectations of the school- Assurancea ought to be made that minority 
P/C participation will occur since this la the target population of the 
bilingual education project and since compliance with federal guidelines 
is a goal in itself* 

A third recommendation is to plan proceaaea and ectlvltles which will 
produce the desired outcomes specified in the goals* The fomatlon of a 
PAC» production of an activities calendar^ formation of standing commit- 
teea (for hlrlngi currlculumi evaluatloni etc*) are examples of processes 
which will achieve some of the short-term goals apeclfled* Parents' actual 
participation in the cleearoom and cultural instructional units prepared by 
parents are examples of processes that may contribute to achieving some of 
the desired long-term goals. 



220 



A fourth recommendation Is one that la presently being addressed hy 
most Blte&t This Is to document the array of actlvltlas that tska placa 
throughout the school year that are of significance to the echool-conununlty 
marriage* Section A-2 of th^ following outline lists a varlaty of P/C 
activities common to bilingual education programs* This list Is hy no 
means exhaustive; however^ It categorizes activities In a systematic manner 
so that It Is possible to Identify the gaps and weaknessftS as well as the 
strengths in a program* P/C actlvltlea and characteristics are grouped by 
domains such ea msnag^ent* curriculum^ and parent advisory committee* 

A thorough evaluation of the P/C component requires going ana step 
beyond the documentation of activities* It requires an attempt to respond 
to the questions^ *'What are the effects of the P/C component on the P/C 
Itaelf » on the students* and on the school?** The goals mentioned earlier 
should specify the changes expected to be produced In each of these areas* 
The next question to ask Is* **How will these changes be manifested?*' The 
answer to this question will determine the choice of the sssesament method 
and time frame most spproprlate for each etes to be evaluated* Sections 
Ct and D of the outline address these questions and offer suggestions for 
selecting chsracterlstlcs to be assessed^ assessment methods^ and a titSi 
frame* 



221 



215 



APPROACHES TO EVALUATION OF PARENT INVOLVEMENT COMPONENT 

Parent /Comnmnltv Involvement Activities and Characteristics 
1* Descriptive Information on P/C Participation 

a* Historical parent Involvement at school selected to house the 
bilingual program (a comparison standard) 

h* Type and amount of community Involvement at selection/adoption 
stage in trying to get the Title VII grant 

c* Amount of time devoted to P/C affairs by staff liaison; source 
of funds for position 

d* Paid or volunteer positions held by parents (comuinlty liai- 
son » teacher aides » etct) 

2* List of Parent /Community Involvement Activities 

a* Management 

• forming staff hiring standing committee 

• planning calendar of school events (holidays^ pl^ys^ caml- 
valS| open house^ etc*) 

• planning student progress reporting procedures 

b* Curriculum planning activities 

• goals and objectives 

• materials selection 

• cultural component 

• first and second language use plan 

• extra currlcular activities 

• planning parent classroom participation 

c* Classroom Involvement 

• parent function (tutor» clerical^ PAC» parent education^ 
etct) 

• language used In activity (English^ other) 

• duration of parent participation 

• contribution of parent (helpful^ Informative^ entertaining^ 
productive) 

• relevancy to program objectives 

• continuity 



222 

2lG 



d* Parent Advisory Committee activities and characteristics 

• parent Input to pAC constitution And rules and regulMlona 

• officers elected vs* appointed (duration of termi qualifi- 
cations » appointed or elected by whom) 

• reaponslhllltlest povera* and limits of pAC 

• par;.lclpatlon of project members (numbers* percentages 
Involved In action committees) 

e participation of minority as compared with Dou-mlnorlty 
parents 

e participation of community organizations and/or Individuals 
e parent In-servlce and parent education 
e PAC budget 

e* Reporting and evaluation actlvltlee 

• PAC standing committee on evaluation 

e classroom visitation (frequency* duration* purpose) 
e parent training on evaluation 
e parents^ Involvement In testing 

s PAC^s evaluation effort as reflected In yearly eveluatlon 
report 

Evaluation of Effects of Parent Participation on Parents_»nd Community 

I* Time Frame 
e each day 

e each event (curriculum unit* parent educational course* a 
field trip* etCt) 

e pre-post (yearly) 

e longitudinal (program^s duration) 

2 * Characterlatlcs Assessed 

e participants^ performance (as pAC members* tutors* etc*) 

e degrees* certification* awards* etc* 

e knowledge of project and skills acquired 

e attitudes towards project 

e commitment (actual participation* aupport) 

e role of parents In school affairs 

e self*concept of parents 



223 

217 



3* Adaeaament ttethoda 

• clapcroom obaervatlon 

• teat 

• queatlonnalre 

• rating scale 

• Interview 

• tally 

C* Evaluation of Effecta of Parent PartlclPA<^lQ" on Studenta 

1* Time Frame 

• each event 

• pre^poat year aaaeaamafit 

• over atudentfi' program participation 

2* Characterlfitlca AaseFaed 

• Btudenta' change In dlaclpllne 

• time on taak 

• language uaage 

• Inter-ethnlc Interaction 

• motivation 

• atudent work produce 

• attltirde 

• ahaenteelam 

• retentlona 

3* Asaeaament Hethoda 

• claaaroom obaervatlon 

• teacher queatlonnalre 

• teata 

• atudent Interview 

• parental report 

• county tally of atudenta' work production 



224 

213 



Evaluation of Effacte of Parent PartlclPfltlon on School 

It Time Frame 

• each event 

e pre^post yearly 

• hiitorlcal (pre-project current) 
e ever project life 

2f Characteristic* Aaaessed 

• staff characteristics 

• teachers' classroom perfosrmance 

• dasnroom ambience 

e parent^school comm*inlcatlon8 

• language usage In school 

• lnter*athnlc Interaction 

• curriculum approprlateneas 

• school budget 

• project evaluation 

3f Assessment H&thods 

• classroom observation 

• rating scale 

• questionnaire 

• tally 

» description 

• pre-'post needs aesessment 



225 

219 



REFERENCES 



American Inetltutee for Ree}«arch. E valuation of the Imfiact of ESEA Tltlt 
VII Spanleh/Engllsh bilingual education firoRramt Volume I; Study 
deeJLgn and Interim findings , Palo Alto> Ckt AIR, February, 1977, 

Bl66ell» J* S« Program evaluation as a Title VII managqnent tool . Forth*- 
coming, XiOa AlamltoSt GA: Southwest Regional Laboratory for Educa- 
tional Research and Devr^lopment (SURL) , 

Bye, T, T, Tests that measure language ability; A descriptive compila- 
tion , Berkeley* CA; BABEL/LAU Center, 1977, 

Center for the Study of Evaluation, CSE simimatlve evaluation kit , Los 
Angelea, CAt University of California, 1975, 

Center for Applied Linguistics, Bilingual education; Current PerPectlves 
(5 vola,), Arlington, VA; CAL, 1977-78, 

Dleterlch, T* 6,, Freeman* C* & Crandall, J, A, A llngulatlc analysis 
of some English proficiency tests. Paper presented at National Asso* 
elation for Bilingual Education Conference* Seattle, WA, Kay 1979, 

Dissemination and Assessment Center for Bilingual Education, Evaluation 
instruments for bilingual education; An annotated bibliography , 
Austin, TX« DACBE, 1976, 

Gilmore, 6,* & Dickerson* A, The relationship between Instruments used 
for identifying children of limited English speaking ability in 
Texas, Houston* TX; Region I, 1979, 

Hoepfner, R, Achievement test selection for program evaluation. In 
In Wargo, H, J, and Green* 0, R, (ed,)* Achievement testing of dis* 
advantaged and minority students for educational Program evaluation , 
CTB/McGraw Hill* 1977, 

Horst* D, P,, Tallmadge* G, K,* & Wood* C, T, A practical guide to mea* 
suring project Impsct on student achievement , Washington* J>C; U,S, 
Government Printing Office* 1975, 

Houta, P* L, The myth of measurabllitv . Mew York! Hart Publishing 
Company* Inc, 1977, 

Hubert* J, An investigation of the Language Assessment Battery (Englishi 
Level I) £or Title VII atudents in Hartford, Hartford* KA; 1978, 

Law* A* Proceedings of the Bilingual Instrument Review Committee (A6 
3470), Sacramento* CAt Office of Program Evaluation and Research* 
California State Department of Education* September 28* 1976, 

Loret* P* G,* et al« Anchor test study * Washington: U,S, Government 
Printing Office, 1974, 



229 

221 



Hackey^ W* F.f & Beebe^ V. N* Bilingual schools for s blcultural com- 
munity: Hiaml's adaptation to the Cuban refugees. Rowley^ HA: 
1977. 

Northwest Regional Educational Laboratory. Oral Isnguage tests for bl* 
lingual students: An evsluatlon of language dominance and Profl* 
clency Instruments . Portland^ OR: NWRL^ 1976. 

Northwest Regional Educational Laboratory^ Center for Bilingual Educa- 
tion . AsBesament lnstruttientfl_ln bilingual education: A descrip- 
tive catalogue of 342 oral and written tests . Los Angeles* CA: 
National Dissemination and Assessment Center* 1978. 

Fletcher^ B. P.» Locks^ N. A.» Reynolds^ D. F.f & Slsslon^ B. G. A guide 
to assessment Instruments for limited English speaking students . New 
York» NY: SsntlUana Publishing Company^ Inc.» 1978. 

Rhodes-Hoover » H.» Pollt2er» R. L.» & Taylor^ 0. Bias In achievement 
and dlagnoatlc reading testst A linguistically oriented view . Un- 
published manuscript* Stanford University^ 1975. 

Roberts^ A. 0. H. Thresholds and decisions . Mountain VleWf CA: SMC 
Research Corporation^ June 1977. 

Roberts^ A. 0. H. Out-of-level testing . Mountain View* CA: KMC Re- 
search Corporation^ 1978. 

Spolsky^ B. (Ed.) Advances In language testing series: 1 - M>Proaches 
to language testing . Papers In Applied Linguistics . Arlington^ VA: 
Center for Applied Linguistics^ 1978. 

Spolsky^ 3. (Ed.) Advances In language testing series: 2 - Some major 
tests . Papers In Applied Linguistics . Arlington^ VA: Center for 
Applied Linguistics^ 1978. 

Tallmadge^ G. K. Cautions to evaluators. in Wargo^ H. J. and Green^ 
0. R. (ed.), Achievement testing of disadvantaged and minority 
students for educational program evaluation . CTB/McGraw Hill, 
1977. 

Tallmadge, G. & Wood» C. T. User^s Guide: ESEA Title I evaluation 
and reporting system (Rev. ed.). Mountain View* CA: KMC Research 
Corporation^ 1976. 

Texas Education Agency. Report from the committee for the evaluation of 
language assessment Instruments^ 1977. 

Wargo^ M. J.* & Green^ D< R. Achievement testing of disadvantaged and 
minority etudents for program evaluation . CTB/McGraw Hlll» 1977. 



230 



